If you run SQL Server 2000, the best approach might be to cluster the all-in-one lookup table, but what criteria will you use to determine the optimal clustering index? You can cluster traditional lookup tables by the code name, by the long name, or by some third column of data, such as a "sort by" column. But if all lookup values for all domains are together in one table, how will you choose the optimal cluster key?
You obfuscate the meaning of the data. Combining all domain values into one table makes data management more difficult. Over time, the set of values within most lookup tables tends to change. You also find the occasional odd case in which one code value in the lookup table is used for more than one category in the database. So how do you handle a situation in which the code becomes obsolete for one category of data but not for the others? If you remove the code from the lookup table, it's not available to other active data sets. If you indicate that a code is obsolete for table A, there's confusion in the other data sets in which the code is still valid. But if you don't invalidate the code for data set A, you run the risk of an invalid code creeping into the database. This is what's called a no-win situation.
Designing for Flexibility
Rather than combining all domain categories and codes into one table, I prefer to give each domain its own table defining the valid set of values. I might have lookup tables with names such as State, Region, DocumentType, CustomerType, and ProductCategory. Here is where I deviate from a rule I usually follow, which is to use identity columns as the primary key. Instead, I use alpha codes as the primary key to designate the unique values of the data in each lookup table. The alpha codes are derived from the long names of the domain values, in much the same way as CO is derived from Colorado, KS from Kansas, and AK from Alaska. Lookup tables are good candidates for using a natural or pseudo-natural primary-key value. For each lookup table, I create short, fixed-length alpha codes to represent each record in the table.
Typically, the set of values in a lookup table is highly stable. You might add to the set of values over time, as when the United States added Alaska and Hawaii as states. Some values might become obsolete, but you typically don't want to delete them because they're part of history. If you use the Delete constraint, Restrict option, as Figure 3 shows, to enforce the relationships, you can't delete obsolete values. Instead, you mark them as obsolete so they won't be used anymore.
The Choice Is Yours
Lookup tables visibly enforce domain integrity in a database. When used properly, they help ensure that the data coming into the database is clean. It's not unusual for lookup tables to comprise 30 percent of the tables in your database, so you want to design them in a way that enhances performance and scalability. For that reason, the single-lookup-table scheme isn't one I would willingly use.
Now you know my opinion. Which direction are you leaning?
Michelle A. Poolet (mapoolet@sqlmag.com) is cofounder of Mount Vernon Data Systems and teaches database design and programming. She has a Masters degree in Computer Information Systems and is an MCP.