Database Requirements 101 (Recording)

In my last entry, I identified that information in the enterprise has a life cycle. To recap, this is the information life cycle:

  1. Data is recorded.
  2. Data is managed
  3. Data is used
  4. Data dies

It is possible to be a good “technical” DBA and to build inadequate databases. Many of the textbooks you will see concentrate on mind numbing theory, Codd’s rules etc. This theory is necessary, but it is not sufficient if you want to build good databases. I will assume that you know your basic theory, that you know about first, second, third and fourth normal form; and I will assume that you know how to read an ER diagram. My goal here is not to rehash what every database professional should already know, but to outline a commonsense approach to data that can be layered on top of this merely technical knowledge.

The Wrong Way To Start

I cannot count the number of times a client has presented me with a list of fields that was prepared by a business analyst. The list usually included data types for each field, and it may also have included rules that apply to each field. In each case, the client expected me to accept this list and to start modeling a database right away. This is wrong! Wrong! Wrong! And, I will tell you why. I cannot list all the reasons, but I will list the few that correspond to my own personal best practices and ways of working.

To be frank, I have never found the list of fields to be correct. I often find fields omitted, and I also find that some of the specified fields are unnecessary. Over time, I also find that some of the fields in the dictionary are poorly named – the rest of the enterprise is often in the habit of referring to some of the fields the business analyst provided by completely different names. Listing fields is the wrong way to start. It often turns out that the list was assembled by a committee – it is useful, but you owe it to your client to question it. All I can do at this stage is outline the practices I employ to determine the correctness of the so-called data dictionary.

Best Practice A: Paperwork

First, I ask to see the paperwork that corresponds to the fields in the dictionary. I have worked on a few HR applications, for example. Often, each new employee is required to fill out forms. In social work or in employee assistance, these might be called intake forms. I ask to see these forms for three data related reasons: 1) Field names in the paperwork should correspond to the field names in the data dictionary. 2) All fields in the paperwork should appear in the data dictionary. 3) All fields in the dictionary should appear in a forms – if they do not, find out where this data originates. If the field in the dictionary does not have an origin, then perhaps it does not belong in the database.

Seeing the forms will also help you understand the work that the database is expected to enable. Work takes place over time, and time is an important dimension of the data.

Seeing the paperwork will also help you plan the data input screens – fields should appear in the same order on the screen as they do in the paperwork. Also, the application should make it easy for a business user to redesign the data entry fields if the paperwork changes. (Nobody has ever consented to pay me to build such a feature, but I still think it is a good idea.)

Best Practice B: Understanding Each Field

You would be surprised to find out how much data is collected and never used. The rule of thumb is, each field should have a use. There may be statutory or regulatory reasons, or there may be reporting reasons. Why collect information if it is never used in a report, or if it is never used to inform a decision? At this stage, many of my clients have agreed to drop a field or three from the data dictionary.

In brief, each field must have a use or a place in ever stage of the information life cycle. All data must be collected, managed, used and expired. If not, why is it there?

Best Practice C: Time Is A dimension of the Data

Almost every system I have worked on, except for databases I have designed, has been designed to statically represent the existing enterprise. In these databases, departments never merge, business units are never renamed, and business rules are absolute.

I once worked on a billing application where the rules changed each time a contract was renegotiated. Services that were billed as a lump sum in one period were billed by the hour in the next. What made it even more difficult was that a client could renegotiate a billing rule at any time during the life of the contract, and the application was required to generate reports across arbitrary periods of time during which the billing rules could change a few times.

The application I am currently working on contains a table of government departments. Over time, departments may merge, split or be eliminated altogether. A good database will reflect this reality. Departments exist from a certain date to a certain date, at which point they may be renamed, split or merge with another department. The data has to be stored such that reports for previous fiscal years can still be generated using the previous department names, just as reports for this fiscal year must reflect the current department names.

I worked on another billing application where the costs of a project in the database were being shared by different lines of business. The cost share was represented as a percentage in a cost share table. This was all fine and good. The reports were correct every month, even though the cost share was changed a few times over the first quarter. The accountants noticed a problem as soon as they generated the first quarterly report. The amounts billed to each line of business in the quarter did not equal the sum of the amount they were billed each month in the previous monthly reports. (If you can follow that, you are not doing badly.) The solution was to alter the cost share table to reflect the fact that a cost share was in effect from a specified date to another specified date, and the reports had to be changed to correctly charge for services across these periods of time.

It is a good practice to ask your client about the impact time has on each field in the database. You may be surprised by the answers you get. Basically, billing codes and work codes change over time. If there is a requirement to be able to generate historical reports, you have to think about how to handle this.

Best Practice D: Be A Pain

I advise you to be a pain with my tongue planted firmly in my cheek. People skills count for a lot. There is no point in pissing someone off, especially if you need his or her cooperation for the duration of your work. However, you should be committed to doing the job you were hired to do, and you should have a mantra of sorts. Mine is: You cannot pay me to do a bad job. I want to do my best, and I stand up for quality.

A few times, I have been in the position of building a web application on top of a database that was being designed by a DBA. Oracle DBAs seem to feel that their superior grasp of the features and capabilities of Oracle qualify them to build the database. Every product or technology has a culture, and Oracle has a culture of arrogance and control, in my opinion. To the arrogant DBA I would say: Being able to spell does not make you Shakespeare.

You owe it to your client to challenge the DBA to address the important issues above. Do not argue for the sake of arguing, but show him or her that you are creating requirement documents and you need his or her help. You are accountable, especially in these days of one-stop-shopping where the developer and the analyst are often the same person. Hold the DBA to the same high standards you try to hold yourself.

The other issue DBAs, especially old-time Oracle die hards, do not understand is that reports must run fast. Those are the expectations people bring to the web, but, also, web applications will timeout if a report takes too long. Parts of the application may also be stored in the database – these must be retrieved quickly, too.

That’s it for now. I could write more, but this covers some of the broad issues and describes some of the important commonsense approaches I use when working with a client to build a database. In my next entry, I will describe the second stage of the information management life cycle, and how understanding it is important if you want to build good databases.



Leave a Reply