On the importance of Data Integrity

Not long ago, I attended a seminar by Chris Date at Warwick Uni DCS on the topic on the advantages of the Closed World Assumption as opposed to the Open World Assumption. I wont be talking about these here, but I’ll just give a quick introduction. In the closed world assumption, if a set of data exists in a database, then it is assumed to be true (for example, Employee Steve has ID 1 and earns 20K). If a set of data could exist in the database, but doesn’t, then it is assumed to be false. In the open world assumption, however, this data can be either true or false – the assumption is that it doesn’t exist in the database because we don’t know if it’s true or not. It’s a very interesting topic, perhaps I’ll talk about it some other time.

The talk was incredibly interesting, but there was one thing he said which really struck a chord, and that was:

The most important property a database can have is that its data is correct.

This struck me in particular as a result of a number of issues I’ve had with recent projects whereby the entire website was crippled because of a couple of bit of incorrect data in the database. Could this have been avoided? Definitely. I could have added lots of constraints to the database, and I could have performed more unit tests on the code itself (I’d only just gotten into doing unit tests at this time, so they weren’t quite as robust as they could have ¬†been). It got me wondering, however, whether it would be useful to unit test that database design itself. It’s all good having “correct” code, but if the underlying database will allow inconsistencies, then there’s still a risk that the system can break. I’m not sure if unit testing the database design at a level below the code would be worthwhile, but it’s something to think about!

Comments ( 0 )

    Leave A Comment

    Your email address will not be published. Required fields are marked *