SQL and Normalisation

I was just having a discussion with a co-worker about some SQL they were doing. They had 3 tables, which looked something like this:

user( user_id, ... );
pages( page_id, ... );
comments( comment_id, user_id, page_id, ... );

So that people can comment on both users and pages. The user_id and page_id in the comments table both defaulted to NULL, and only one would be non-null at any one time. He argued that this was a fine solution to the problem since it minimised the number of tables you had, and that he would gain no actual data storage by separating it out and having two join tables between user-comments and page-comments. I disagree.

I believe that since a comment is an entity which can be applied to numerous things, the table for it should be self-contained, i.e. not have any foreign keys for the things it is applied to. This changes, of course, when the comments apply to only one thing, when it stops being “comments” and starts being “user_comments”, for example. Although the actual relation between user-comments and page-comments is one-to-many, because there are multiple entities I believe it is different case, and thus deserves join tables as per a many-to-many relation. In terms of code, having the join tables makes things a lot simpler (although he argued that you lose a bit of efficieny since you need two queries to save an entity instead of just one for his design). Also, flexibility is greatley increased with join tables – what if the client wants to add a “media” table, which can also have comments? Are you going to have to go back and modify all your queries so that it caters for having 3 possible null fields instead of 2?

My collegue was asking me because he wanted to do a query which did some joins, and then did some other joins based on the value of the non-null field in comments. I wasn’t sure of the actual query, so don’t know if the join-tables solution would have helped (though it proably would…), but he was going down the route of adding SQL conditional statements into his query. Again, here I believe that as little logic processing as possible should be done in everything except the language you are building in (be it PHP or whatever).

What do you think? Am I over-normalising my tables?

Comments ( 2 )
  • Chicken Salad says:

    No you are not over-normalising. Everything should be in 6NF. Also you should be avoiding null fields obviously. That and why are you even putting up with SQL? As we all know it is a ridiculous a useless language.

  • Dave says:

    Have to agree, this should definitely have been normalized. MySQL IF conditions are nowhere near as efficient as using a normalized schema and some joins because the server can cache those much more effectively.

    There are certain situations I’ve encountered where normalizing would be an absolute pain in the ass and queries on the tables would be horrific; mostly, that’s when I’ve had to store some irritating data structure, and it’s been a lot simpler (and quicker) to just store this as serialized data.

Leave A Comment

Your email address will not be published. Required fields are marked *