My goal in this entry is not to mention tools by name, but I am getting to the point where I am identifying commercial tools that will have to become part of my inventory as I move to using open source exclusively.
I have used Eclipse and I would not want to use anything else for writing Java code. I hold out great hope for this project, and I look forward to evaluating many of the other tools that plug into this framework. I have used PyDev to edit Python code in Eclipse, as well as Red Robin for Jython. (At the time, I was not happy with auto completion, but I was impressed enough overall.) I use Quanta to edit PHP, and I am happy to do so. I would like to find a good open source JavaScript editor that performs good auto completion – using I rely on auto completion for the DOM.
Since I jump back and forth between Python, Java, SQL, HTML, CSS, JavaScript, C, C#, PHP, and VB.NET, I often do not remember method names, but I recognize them. For example, today I am working in C# and I am using a datareader to retrieve data from the database. I know that my datareader can be populated by executing a method on the command object, but I cannot remember if the method is called ‘ExecuteReader’ or ExecuteDataReader’. Auto completion helps me be productive as I type rather than requiring me to reach for a manual or use a help file. I am not ashamed of relying on the auto completion feature of the IDE.
I am very happy with MySQL, Postgres and SQLite, each of which I would use under different circumstances, but I need a good modeling tool. I have seen some commercial software that allows for modeling both Postgres and MySQL databases. Once a database reaches a certain size, I like having a picture. I also find that it is useful to sit down with a client to validate the physical model against the business rules. I find that clients can read an ER diagram if I am there to help. Therefore, I will be shopping for a good tool. I have used DBDesigner for MySQL, and I have enjoyed, but I am getting tired of waiting to see the final version.
I like UML, especially as I am planning. I do not want to pay for Rational Rose. I know that there are modeling tools that plug into Eclipse. I like pictures of both the database and the libraries I build. I want to find a good tool soon. I do not need use cases, and I would not use Rational Rose or even Visio to manage them. Last summer, I was exposed to a product called Catalyze by Steeltrace. Given the way I think and work, no other tool compares. I want to write more about this tool later. This tool generates Word documents, process flows, and uses cases.
I have reservations about the MS Windows bias of the Catalyze tool, but I seem to recall that it supports MySQL. It may not be a stretch for it to support Open Office. I know that Catalyze uses Velocity – if Catalyze uses XML to create the Word document, it should possible to open it with Open Office. This tool is so good, I would be willing to run Windows on at least one of my computers just to be able to use it.
So, must have features for me are auto completion and useful wizards in an IDE, good modeling tools for databases and libraries, and good requirements tools. Eclipse supports some modeling tools, but I have not tried them yet. At the end of the day, I am not so ideological that I would refuse to buy and use commercial software if an open source equivalent is not available. I draw the line at purchasing tools that would require my clients to buy licenses, too. I do not want to pass on costs to my clients that way.
Over the years, I have seen many bad solutions that were built with outstanding tools. As I have said before – as others before me have said – a fool with a tool is still a fool. Decision makers like to pick so-called enterprise solutions, or best-of-breed tools. Good for them. However, in most cases, these tools provide a double-helping complexity along with their enterprise features, adding to the total-cost-of-ownership (TCO) of a software asset and decreasing the agility of the enterprise.
Therefore, I am big supporter of good-enough-technology. There are plenty of vanilla flavored but mission critical applications out there that do not require the most feature-rich tools. Some of us live in skyscrapers, but the rest of us live in houses. Construction techniques that are good for one type of structure are inappropriate for the other.
The question is not whether .NET is superior to PHP, or whether MySQL is superior to PostresSQL. It starts with the requirements, stupid! I feel myself getting cranky every time I read that a certain tool or approach is superior to another.
I assert: certain developers are superior to others regardless of the tool. Also, certain tools are superior to others in certain circumstances. In many circumstances, given the requirements, good enough is good enough!
Speaking of good enough, I am very impressed with SQLite. I find that it is a blindingly fast little database that can be used from PHP, Python, Java and C#. In PHP and Python, I was able to write my own functions anbd use them in SQL statements. If you have a client who does not require heavy support for concurrent database writing, if you are building a desktop application that is being accessed by a single user or if you need a good and reliable in-memory database, SQLite is more than good enough.
I have not put this database through its paces to see how well it runs on a website, but, given its speed, I expect that it releases resources quickly. As long as the web application does not require a lot of writing, I expect the SQLite could support many of the small website and web applications I have seen. Give it a try, and let me what you think.
In my last entry, I identified that information in the enterprise has a life cycle. To recap, this is the information life cycle:
It is possible to be a good “technical” DBA and to build inadequate databases. Many of the textbooks you will see concentrate on mind numbing theory, Codd’s rules etc. This theory is necessary, but it is not sufficient if you want to build good databases. I will assume that you know your basic theory, that you know about first, second, third and fourth normal form; and I will assume that you know how to read an ER diagram. My goal here is not to rehash what every database professional should already know, but to outline a commonsense approach to data that can be layered on top of this merely technical knowledge.
I cannot count the number of times a client has presented me with a list of fields that was prepared by a business analyst. The list usually included data types for each field, and it may also have included rules that apply to each field. In each case, the client expected me to accept this list and to start modeling a database right away. This is wrong! Wrong! Wrong! And, I will tell you why. I cannot list all the reasons, but I will list the few that correspond to my own personal best practices and ways of working.
To be frank, I have never found the list of fields to be correct. I often find fields omitted, and I also find that some of the specified fields are unnecessary. Over time, I also find that some of the fields in the dictionary are poorly named – the rest of the enterprise is often in the habit of referring to some of the fields the business analyst provided by completely different names. Listing fields is the wrong way to start. It often turns out that the list was assembled by a committee – it is useful, but you owe it to your client to question it. All I can do at this stage is outline the practices I employ to determine the correctness of the so-called data dictionary.
First, I ask to see the paperwork that corresponds to the fields in the dictionary. I have worked on a few HR applications, for example. Often, each new employee is required to fill out forms. In social work or in employee assistance, these might be called intake forms. I ask to see these forms for three data related reasons: 1) Field names in the paperwork should correspond to the field names in the data dictionary. 2) All fields in the paperwork should appear in the data dictionary. 3) All fields in the dictionary should appear in a forms – if they do not, find out where this data originates. If the field in the dictionary does not have an origin, then perhaps it does not belong in the database.
Seeing the forms will also help you understand the work that the database is expected to enable. Work takes place over time, and time is an important dimension of the data.
Seeing the paperwork will also help you plan the data input screens – fields should appear in the same order on the screen as they do in the paperwork. Also, the application should make it easy for a business user to redesign the data entry fields if the paperwork changes. (Nobody has ever consented to pay me to build such a feature, but I still think it is a good idea.)
You would be surprised to find out how much data is collected and never used. The rule of thumb is, each field should have a use. There may be statutory or regulatory reasons, or there may be reporting reasons. Why collect information if it is never used in a report, or if it is never used to inform a decision? At this stage, many of my clients have agreed to drop a field or three from the data dictionary.
In brief, each field must have a use or a place in ever stage of the information life cycle. All data must be collected, managed, used and expired. If not, why is it there?
Almost every system I have worked on, except for databases I have designed, has been designed to statically represent the existing enterprise. In these databases, departments never merge, business units are never renamed, and business rules are absolute.
I once worked on a billing application where the rules changed each time a contract was renegotiated. Services that were billed as a lump sum in one period were billed by the hour in the next. What made it even more difficult was that a client could renegotiate a billing rule at any time during the life of the contract, and the application was required to generate reports across arbitrary periods of time during which the billing rules could change a few times.
The application I am currently working on contains a table of government departments. Over time, departments may merge, split or be eliminated altogether. A good database will reflect this reality. Departments exist from a certain date to a certain date, at which point they may be renamed, split or merge with another department. The data has to be stored such that reports for previous fiscal years can still be generated using the previous department names, just as reports for this fiscal year must reflect the current department names.
I worked on another billing application where the costs of a project in the database were being shared by different lines of business. The cost share was represented as a percentage in a cost share table. This was all fine and good. The reports were correct every month, even though the cost share was changed a few times over the first quarter. The accountants noticed a problem as soon as they generated the first quarterly report. The amounts billed to each line of business in the quarter did not equal the sum of the amount they were billed each month in the previous monthly reports. (If you can follow that, you are not doing badly.) The solution was to alter the cost share table to reflect the fact that a cost share was in effect from a specified date to another specified date, and the reports had to be changed to correctly charge for services across these periods of time.
It is a good practice to ask your client about the impact time has on each field in the database. You may be surprised by the answers you get. Basically, billing codes and work codes change over time. If there is a requirement to be able to generate historical reports, you have to think about how to handle this.
I advise you to be a pain with my tongue planted firmly in my cheek. People skills count for a lot. There is no point in pissing someone off, especially if you need his or her cooperation for the duration of your work. However, you should be committed to doing the job you were hired to do, and you should have a mantra of sorts. Mine is: You cannot pay me to do a bad job. I want to do my best, and I stand up for quality.
A few times, I have been in the position of building a web application on top of a database that was being designed by a DBA. Oracle DBAs seem to feel that their superior grasp of the features and capabilities of Oracle qualify them to build the database. Every product or technology has a culture, and Oracle has a culture of arrogance and control, in my opinion. To the arrogant DBA I would say: Being able to spell does not make you Shakespeare.
You owe it to your client to challenge the DBA to address the important issues above. Do not argue for the sake of arguing, but show him or her that you are creating requirement documents and you need his or her help. You are accountable, especially in these days of one-stop-shopping where the developer and the analyst are often the same person. Hold the DBA to the same high standards you try to hold yourself.
The other issue DBAs, especially old-time Oracle die hards, do not understand is that reports must run fast. Those are the expectations people bring to the web, but, also, web applications will timeout if a report takes too long. Parts of the application may also be stored in the database – these must be retrieved quickly, too.
That’s it for now. I could write more, but this covers some of the broad issues and describes some of the important commonsense approaches I use when working with a client to build a database. In my next entry, I will describe the second stage of the information management life cycle, and how understanding it is important if you want to build good databases.