• subscribe
January 25, 2006 12:00 AM

Designing for Performance: Lookup Tables

When it comes to lookup tables, are more really better?
SQL Server Pro
InstantDoc ID #48811

If you run SQL Server 2000, the best approach might be to cluster the all-in-one lookup table, but what criteria will you use to determine the optimal clustering index? You can cluster traditional lookup tables by the code name, by the long name, or by some third column of data, such as a "sort by" column. But if all lookup values for all domains are together in one table, how will you choose the optimal cluster key?

You obfuscate the meaning of the data. Combining all domain values into one table makes data management more difficult. Over time, the set of values within most lookup tables tends to change. You also find the occasional odd case in which one code value in the lookup table is used for more than one category in the database. So how do you handle a situation in which the code becomes obsolete for one category of data but not for the others? If you remove the code from the lookup table, it's not available to other active data sets. If you indicate that a code is obsolete for table A, there's confusion in the other data sets in which the code is still valid. But if you don't invalidate the code for data set A, you run the risk of an invalid code creeping into the database. This is what's called a no-win situation.

Designing for Flexibility
Rather than combining all domain categories and codes into one table, I prefer to give each domain its own table defining the valid set of values. I might have lookup tables with names such as State, Region, DocumentType, CustomerType, and ProductCategory. Here is where I deviate from a rule I usually follow, which is to use identity columns as the primary key. Instead, I use alpha codes as the primary key to designate the unique values of the data in each lookup table. The alpha codes are derived from the long names of the domain values, in much the same way as CO is derived from Colorado, KS from Kansas, and AK from Alaska. Lookup tables are good candidates for using a natural or pseudo-natural primary-key value. For each lookup table, I create short, fixed-length alpha codes to represent each record in the table.

Typically, the set of values in a lookup table is highly stable. You might add to the set of values over time, as when the United States added Alaska and Hawaii as states. Some values might become obsolete, but you typically don't want to delete them because they're part of history. If you use the Delete constraint, Restrict option, as Figure 3 shows, to enforce the relationships, you can't delete obsolete values. Instead, you mark them as obsolete so they won't be used anymore.

The Choice Is Yours
Lookup tables visibly enforce domain integrity in a database. When used properly, they help ensure that the data coming into the database is clean. It's not unusual for lookup tables to comprise 30 percent of the tables in your database, so you want to design them in a way that enhances performance and scalability. For that reason, the single-lookup-table scheme isn't one I would willingly use.

Now you know my opinion. Which direction are you leaning?

Michelle A. Poolet (mapoolet@sqlmag.com) is cofounder of Mount Vernon Data Systems and teaches database design and programming. She has a Masters degree in Computer Information Systems and is an MCP.



ARTICLE TOOLS

Comments
  • kobus
    6 years ago
    Sep 19, 2006

    Disagree with "hotspot" theory. A small lookup table is most likely to be memory cached.
    Disagree with Denormalization hype. Its not as if 100% of all datbase tables are fully normalised in practice. There is always some denormalization going on, and it hurts the least in the relatively smaller lookup tables.
    Disagree that a single lookup table is non-scalable. You are still free to duplicate the set of values for different categories, so changes to one don't affect each other. Agree with alpha/natural primary keys for lookups. Agree that it causes obfuscation, but thats a minor tradeoff for not having 100's of 3-row lookup tables which is another maintenance problem.

    Note that I am talking about lookup tables of small size which store things like form dropdowns , gender etc. Lets say things that have less than 5 options each. I will still keep things like state and country in their own tables.

  • PHILIP
    6 years ago
    Mar 29, 2006

    Well thought out and explained.

  • Jean
    6 years ago
    Mar 20, 2006

    Michelle, this timely article is important for me because I have a colleague touting his old company used the single-reference table approach and is suggesting we do the same. I'd never heard of the single-reference table approach. The point made is that with so many tables and lookup tables, the single-reference table would cut down the clutter. Your point on the "hot spot" is spot-on and I appreciate the comments by Rhys (rmjcsltd,02/20/06) as well. I will arm myself with these pieces of knowledge and test the waters.

  • Rhys
    6 years ago
    Feb 20, 2006

    Michelle,

    I agree completely that individual tables are nearly always the right thing to do, from a data integrity and design clarity point of view.

    However, I don't think that the single table approach has to lead to a performance bottleneck. A single table will be cached in RAM more efficiently than multiple very small tables. If the data really is read-only, then it can be put on a read-only filegroup and SQL Server won't need to use locks when accessing it.

    Rhys

You must log on before posting a comment.

Are you a new visitor? Register Here