I have read all sorts of crap that makes claims about the superiority of one database product over another. In my experience, I have seen some of the worst databases developed using the some of the best tools. Modeling is important, and so many of the databases I see are poorly conceived.
Lately, I have been thinking about data – particularly something I have been calling smart data, but I digress. The so-called relational database is based on set theory, which is a precise mathematical way of thinking about things. I say so-called not because I want to criticize the relational database or the theory that supports it. I want to criticize the way the tool has been used.
One of the things that strikes me is that it is hard to build a new relationship in most databases. You may have to build an association table or add a new field, but you also have to write SQL statements, stored procedures, and code to use the new relationship. There should be an easy way to say that a relationship AB exists between entity A and entity B; and, it should be easy to begin using this information without writing new code.
If I were to come up with a new way to define and use data, this would be a requirement. The relational database can store and define relationships, but it requires too much thought and too much work. In the world we live in, things are related to each other in a multiplicity of ways. There are all sorts of relationships – for example, some of the possible relationships between a book (entity A) and a person (entity B) are:
The relational database, as it is being used, makes it hard or impossible to create relationships on the fly. The work of creating new association tables, or adding fields, to express these relationships is prohibitive. Even if one were to build a database with a relationship table that allowed for multiple relationships between books and people, it would still be necessary to build a new relationship table to store the multiple relationships that could exist between the other entities in the database.
Basically, if I were to go out on a limb, I would say that the relational database is not relational. Each entity table in a relational database represents a class of entities, defined by a set of attributes, some combination of which uniquely defines the entity. If I have twenty entity types in my database, and I add a twenty-first, this could create twenty new relationship associations – my latest entity type could be related to any or all of the other twenty entities.
In practice, we usually only accommodate the relationships that are useful to us, and we live with the lie that the other entity types are not or could not be related to each other. It is too much work and coding to represent the data as it is. Data modelers are good at creating a physical model that can store and retrieve data is ways that are useful within the confines of some narrowly defined requirements.
For example, a database may provide the means to store the fact that employee X works at warehouse Y, but not the fact that warehouse Y is wheelchair accessible. To store this information, we can add a boolean attribute to the warehouse table to identify it as wheelchair accessible or not. Or, we can create an association of wheelchair accessible types with warehouses – this has the advantage of allowing us to associate wheelchair accessibility with any other facility that is not a warehouse. However, this all requires work by a DBA and some developers. Basically, as it is, the model is too rigid and brittle to accommodate changing requirements.
Relationships are important, but the relational database seems to focus more on attributes. Object databases, as they are being build today – to the best of my knowledge, that is – mirror the thinking that goes into building relational databases. Object stores are often used as a substitute for a relational database, and they are often compared to relational databases. I think this will change as people see the new possibilities that open up in an object-based repository.
Moving to an object store should mean moving to new ways of thinking about data. It should be easy to define a new relationship, to specify that it applies to the domain of entities of type A and type B, and to begin using the relationship without writing new code.
Using db4o, I have created an object that serves the purpose of storing information about the relationships that exist between entities. Any entity can be related to any other entity, but some relationships are only possible between certain types of entities. For example, I may allow books to be owned by people, but I may not allow people to be owned by people. I’ll see how well I can get this construct to perform – first, it has to be correct, but later, I want to it be fast. I have provide the means to associate any entity with any other entity in a multiplicity of ways.
Please note: E.F. Codd was smarter than I am – at the very least, he knew more about set theory. The relational model is not dead – nor should it be killed, but thinking about data is fun. Mistakes and errors are instructional. Correct me when I seem to be wrong.