Thinking About Data

I have read all sorts of crap that makes claims about the superiority of one database product over another. In my experience, I have seen some of the worst databases developed using the some of the best tools. Modeling is important, and so many of the databases I see are poorly conceived.

Lately, I have been thinking about data – particularly something I have been calling smart data, but I digress. The so-called relational database is based on set theory, which is a precise mathematical way of thinking about things. I say so-called not because I want to criticize the relational database or the theory that supports it. I want to criticize the way the tool has been used.

One of the things that strikes me is that it is hard to build a new relationship in most databases. You may have to build an association table or add a new field, but you also have to write SQL statements, stored procedures, and code to use the new relationship. There should be an easy way to say that a relationship AB exists between entity A and entity B; and, it should be easy to begin using this information without writing new code.

If I were to come up with a new way to define and use data, this would be a requirement. The relational database can store and define relationships, but it requires too much thought and too much work. In the world we live in, things are related to each other in a multiplicity of ways. There are all sorts of relationships – for example, some of the possible relationships between a book (entity A) and a person (entity B) are:

  1. Was read by
  2. Is being read by
  3. Is owned by
  4. Was borrowed by
  5. Was returned by
  6. Was lost by
  7. Is loved by
  8. Was given to
  9. Is recommended by
  10. Is being used by
  11. Was sold to
  12. Was stolen from
  13. Was stolen by
  14. Was found by
  15. Was burned by

The relational database, as it is being used, makes it hard or impossible to create relationships on the fly. The work of creating new association tables, or adding fields, to express these relationships is prohibitive. Even if one were to build a database with a relationship table that allowed for multiple relationships between books and people, it would still be necessary to build a new relationship table to store the multiple relationships that could exist between the other entities in the database.

Basically, if I were to go out on a limb, I would say that the relational database is not relational. Each entity table in a relational database represents a class of entities, defined by a set of attributes, some combination of which uniquely defines the entity. If I have twenty entity types in my database, and I add a twenty-first, this could create twenty new relationship associations – my latest entity type could be related to any or all of the other twenty entities.

In practice, we usually only accommodate the relationships that are useful to us, and we live with the lie that the other entity types are not or could not be related to each other. It is too much work and coding to represent the data as it is. Data modelers are good at creating a physical model that can store and retrieve data is ways that are useful within the confines of some narrowly defined requirements.

For example, a database may provide the means to store the fact that employee X works at warehouse Y, but not the fact that warehouse Y is wheelchair accessible. To store this information, we can add a boolean attribute to the warehouse table to identify it as wheelchair accessible or not. Or, we can create an association of wheelchair accessible types with warehouses – this has the advantage of allowing us to associate wheelchair accessibility with any other facility that is not a warehouse. However, this all requires work by a DBA and some developers. Basically, as it is, the model is too rigid and brittle to accommodate changing requirements.

Relationships are important, but the relational database seems to focus more on attributes. Object databases, as they are being build today – to the best of my knowledge, that is – mirror the thinking that goes into building relational databases. Object stores are often used as a substitute for a relational database, and they are often compared to relational databases. I think this will change as people see the new possibilities that open up in an object-based repository.

Moving to an object store should mean moving to new ways of thinking about data. It should be easy to define a new relationship, to specify that it applies to the domain of entities of type A and type B, and to begin using the relationship without writing new code.

Using db4o, I have created an object that serves the purpose of storing information about the relationships that exist between entities. Any entity can be related to any other entity, but some relationships are only possible between certain types of entities. For example, I may allow books to be owned by people, but I may not allow people to be owned by people. I’ll see how well I can get this construct to perform – first, it has to be correct, but later, I want to it be fast. I have provide the means to associate any entity with any other entity in a multiplicity of ways.

Please note: E.F. Codd was smarter than I am – at the very least, he knew more about set theory. The relational model is not dead – nor should it be killed, but thinking about data is fun. Mistakes and errors are instructional. Correct me when I seem to be wrong.

DB4O Problem Fixed!

As I reported a couple of days ago, I was having a problem with a DB4O experiment I was working on. Basically, after about a million records, my application would raise an error saying that I had run out of heap space. I could increase the heap space, but I was unhappy with that solution – it produced only margin improvements, and then I ran out of heap space again.

German Viscuso, who seems to read my blog, suggested that I post my problem to the db4o user forum. I visited the forum, and I found several possible solutions, but I found one that worked.

Early on in my code experiment, I was using an index, but I commented that line out to see how that changed performance – then, I am embarrassed to admit, I forgot to uncomment the line. Users of db4o, please note, the memory requirements and performance of db4o change dramatically if you do not use indexes. Use an index. My problems went away the moment I uncommented the line that maintains an index.

I am sorry to have written a post that suggested that there may be a problem with db4o – the problem was mine. Everybody who uses db4o needs to know: use an index. In a relational database, this affects performance only. With a db4o object cache, it can result in a fatal error.

In my last post, I mentioned that I had some code to ensure that my objects were unique, but it tended to be slow. I was looking forward to getting a chance to use version 6.2, which supports unique constraints. As it turns out, using an index speeds up my inserts as well – it is now both fast, and the repository does not allow duplicates.

In closing, conserve resources, use an index. That’s the ticket. The code to use an index looks like this:
Db4o.configure().objectClass(Person.class).objectField(“firstName”).indexed(true);

You Were Wrong!

Six years ago, I was using the Microsoft remote scripting toolkit. The technique is now called AJAX. I envisaged applications that behaved more like desktop applications – I even built a demo to show online chat in a browser. There was nothing difficult or sophisticated about what I was doing, but everybody seemed to think it was a bad idea.

One of the objections was that these remote calls I was making would suck up all of the available bandwidth on the server. I thought this was silly. Why reload a whole page when you can simply fetch the part of the page you need – I thought the “page” metaphor was great for web sites, but it was not useful for applications.

I was so very pleased to see AJAX tack off. I was also pleased to find this article that says what I said: AJAX style applications decrease bandwidth. Check it out. Today, as a gift to myself, I want to say to those people: You were wrong! I was right! (Here’s a big friendly old raspberry for you! FFTHHAAAP!)

« Previous Entries