In my last entry, I identified that information in the enterprise has a life cycle. To recap, this is the information life cycle:
It is possible to be a good “technical” DBA and to build inadequate databases. Many of the textbooks you will see concentrate on mind numbing theory, Codd’s rules etc. This theory is necessary, but it is not sufficient if you want to build good databases. I will assume that you know your basic theory, that you know about first, second, third and fourth normal form; and I will assume that you know how to read an ER diagram. My goal here is not to rehash what every database professional should already know, but to outline a commonsense approach to data that can be layered on top of this merely technical knowledge.
I cannot count the number of times a client has presented me with a list of fields that was prepared by a business analyst. The list usually included data types for each field, and it may also have included rules that apply to each field. In each case, the client expected me to accept this list and to start modeling a database right away. This is wrong! Wrong! Wrong! And, I will tell you why. I cannot list all the reasons, but I will list the few that correspond to my own personal best practices and ways of working.
To be frank, I have never found the list of fields to be correct. I often find fields omitted, and I also find that some of the specified fields are unnecessary. Over time, I also find that some of the fields in the dictionary are poorly named – the rest of the enterprise is often in the habit of referring to some of the fields the business analyst provided by completely different names. Listing fields is the wrong way to start. It often turns out that the list was assembled by a committee – it is useful, but you owe it to your client to question it. All I can do at this stage is outline the practices I employ to determine the correctness of the so-called data dictionary.
First, I ask to see the paperwork that corresponds to the fields in the dictionary. I have worked on a few HR applications, for example. Often, each new employee is required to fill out forms. In social work or in employee assistance, these might be called intake forms. I ask to see these forms for three data related reasons: 1) Field names in the paperwork should correspond to the field names in the data dictionary. 2) All fields in the paperwork should appear in the data dictionary. 3) All fields in the dictionary should appear in a forms – if they do not, find out where this data originates. If the field in the dictionary does not have an origin, then perhaps it does not belong in the database.
Seeing the forms will also help you understand the work that the database is expected to enable. Work takes place over time, and time is an important dimension of the data.
Seeing the paperwork will also help you plan the data input screens – fields should appear in the same order on the screen as they do in the paperwork. Also, the application should make it easy for a business user to redesign the data entry fields if the paperwork changes. (Nobody has ever consented to pay me to build such a feature, but I still think it is a good idea.)
You would be surprised to find out how much data is collected and never used. The rule of thumb is, each field should have a use. There may be statutory or regulatory reasons, or there may be reporting reasons. Why collect information if it is never used in a report, or if it is never used to inform a decision? At this stage, many of my clients have agreed to drop a field or three from the data dictionary.
In brief, each field must have a use or a place in ever stage of the information life cycle. All data must be collected, managed, used and expired. If not, why is it there?
Almost every system I have worked on, except for databases I have designed, has been designed to statically represent the existing enterprise. In these databases, departments never merge, business units are never renamed, and business rules are absolute.
I once worked on a billing application where the rules changed each time a contract was renegotiated. Services that were billed as a lump sum in one period were billed by the hour in the next. What made it even more difficult was that a client could renegotiate a billing rule at any time during the life of the contract, and the application was required to generate reports across arbitrary periods of time during which the billing rules could change a few times.
The application I am currently working on contains a table of government departments. Over time, departments may merge, split or be eliminated altogether. A good database will reflect this reality. Departments exist from a certain date to a certain date, at which point they may be renamed, split or merge with another department. The data has to be stored such that reports for previous fiscal years can still be generated using the previous department names, just as reports for this fiscal year must reflect the current department names.
I worked on another billing application where the costs of a project in the database were being shared by different lines of business. The cost share was represented as a percentage in a cost share table. This was all fine and good. The reports were correct every month, even though the cost share was changed a few times over the first quarter. The accountants noticed a problem as soon as they generated the first quarterly report. The amounts billed to each line of business in the quarter did not equal the sum of the amount they were billed each month in the previous monthly reports. (If you can follow that, you are not doing badly.) The solution was to alter the cost share table to reflect the fact that a cost share was in effect from a specified date to another specified date, and the reports had to be changed to correctly charge for services across these periods of time.
It is a good practice to ask your client about the impact time has on each field in the database. You may be surprised by the answers you get. Basically, billing codes and work codes change over time. If there is a requirement to be able to generate historical reports, you have to think about how to handle this.
I advise you to be a pain with my tongue planted firmly in my cheek. People skills count for a lot. There is no point in pissing someone off, especially if you need his or her cooperation for the duration of your work. However, you should be committed to doing the job you were hired to do, and you should have a mantra of sorts. Mine is: You cannot pay me to do a bad job. I want to do my best, and I stand up for quality.
A few times, I have been in the position of building a web application on top of a database that was being designed by a DBA. Oracle DBAs seem to feel that their superior grasp of the features and capabilities of Oracle qualify them to build the database. Every product or technology has a culture, and Oracle has a culture of arrogance and control, in my opinion. To the arrogant DBA I would say: Being able to spell does not make you Shakespeare.
You owe it to your client to challenge the DBA to address the important issues above. Do not argue for the sake of arguing, but show him or her that you are creating requirement documents and you need his or her help. You are accountable, especially in these days of one-stop-shopping where the developer and the analyst are often the same person. Hold the DBA to the same high standards you try to hold yourself.
The other issue DBAs, especially old-time Oracle die hards, do not understand is that reports must run fast. Those are the expectations people bring to the web, but, also, web applications will timeout if a report takes too long. Parts of the application may also be stored in the database – these must be retrieved quickly, too.
That’s it for now. I could write more, but this covers some of the broad issues and describes some of the important commonsense approaches I use when working with a client to build a database. In my next entry, I will describe the second stage of the information management life cycle, and how understanding it is important if you want to build good databases.
I love databases. In fact, I would rarely consider working on an application if my database skills were not required. Having said that, there are some common mistakes in the databases I see in the course of my work. There are many articles on the theory of database construction. While I adhere to common practices, my goal here is to layout a commonsense approach to designing a database, and to comment on some errors I have seen.
In the 1990s, my friend Mike Lightheart introduced me to what he called “The Information Management Lifecycle.” In recent years, I have seen the term used by Computer Associates and other companies. I want to introduce the concept as I use it in my day-to-day work.
Data in the enterprise goes through the following stages:
In the requiremenents phase of a project, it is typical to compile a data dictionary of sorts – a list of the fields that need to be managed in a database. At some point, a data model is constructed, the database is built and the application is developed. I use the ILM to help me provide my client with solutions that I believe are superior to many of the solutions I have seen – if I can be permitted to be so humble.
Over the next few entries, I will outline how awareness of the Information Management Lifecycle influences my design of an application and the underlying database.
Over the years, I have noticed that many developers are male, and males like toys. In our case, our tools are our toys. And, of course, everybody wants to have the best toys. This leads to all sorts of nonsense – it leads to a cult-like devotion to certain products and technologies. It also leads some developers to devalue those who choose to employ ot promote a different set of tools. In many cases, there is a near hatred of competing technologies and the people associated with them.
This has many causes, I am sure, but I just want to say this: my favorite toy is my brain. I have been both challenged and excited by every technology I have been exposed to. I do not “hate†Windows, nor do I fear Linux. When I told my friend Joel, who works at Microsoft, that I had installed Linux on one of machines at home, he joke, “Oh no, Jim, don’t go over to the dark side.†That was a few years ago and I have never looked back.
I enjoy the challenge of working with most technologies, but I enjoy the process rather than the tool. I also believe that the quality of the code I write is determined by me, not by my choice of tool. I would rather hire someone like me than someone who uses a specific tool.
To use an analogy: it would be silly to believe that I could write better short stories with Microsoft Office than I could with OpenOffice. It would be just as silly to believe that one must use the word “because†in a novel or it is doomed to be second rate. It could be noted that every novel every write has used the word, but that does not mean that the word “because†is mandatory. Every carpenter has a favorite hammer, I am sure, but to each his own, I say.
I like smart people. I like good ideas. I like good workmanship. I like craft and discipline. I like to have fun. This influences the way I use various products more than it influences my choice of technologies. Let a thousand flowers bloom!
It’s the fanaticism I hate, not the tools. It’s negativity, criticism and narrow-mindedness I despise. It’s arrogance and ego that turns my stomach.