Often, the reference table contains only one column that carries a descriptor value. In contrast, the PRODUCT_CATEGORY table consists of two columns. The CategoryLongName column contains the long name of the category, which is the descriptor value. The CategoryCode column is a derivative of the first column; CategoryCode contains a manufactured code that represents the corresponding long name. For instance, a category long name might be vegetables, but the category code would be VEG. DBAs or data stewards create these category codes. A data steward knows and understands the data stored in the database and constantly monitors the data integrity. Because each code is a derivative of its descriptor, the code inherits the definition and meaning of its associated descriptor value.
In the PRODUCT_CATEGORY table, CategoryLongName is the natural key and CategoryCode is a derivative of the category long name. Although you might hesitate to use the category long name as a primary key (and understandably so, because the category long name is a variable character data type with a maximum length of 50 characters), the category code would be an acceptable candidate for the job. It satisfies all the requirements for a primary key. Because the category code is the only candidate, it becomes the primary key.
Consider the criterion that a primary key value must never change. If one day you decided to differentiate between fresh and frozen vegetables, you could create two new codes that would distinguish fresh veggies, VFR, from frozen veggies, VFZ, while leaving the old code, VEG, intact. You wouldn't have to change any historical records because before you named the two new categories of vegetables, all veggies were classified the same. Fortunately, SQL Server enforces the criterion that a primary key must be a unique identifier for each row in a table by automatically creating a unique index on the primary key column. Without this support from SQL Server, you'd have to confirm in your programs that this code hadn't been used before. What about the criterion that a primary key has to apply to all rows in a table? In this case, because the columns in this table are a descriptor and a derivative of the descriptor, each natural key will always have a value. As for the criterion that a primary key must be minimal, the derivative, CategoryCode, is a character(3) data type. You can't get much more minimal than that.
Apply the Criteria
When people ask me which makes a better primary keya natural value or a surrogate identifierI answer that the choice depends on the situation. In all the modeling work that I've done, I've rarely been able to identify a suitable natural key as a candidate for the job. I've been able to identify natural search keys, both unique and nonunique. But when I apply the prerequisite testsis it unique? does it apply to all rows? is it minimal? is it stable over time?to these natural keys, most natural keys fail. My personal preference is to use a surrogate key unless I can identify an appropriate natural key that meets the four criteria for the primary key.
End of Article
Prev. page
1
2
[3]
next page -->