Often, the reference table contains only one column that carries a descriptor value. In contrast, the PRODUCT_CATEGORY table consists of two columns. The CategoryLongName column contains the long name of the category, which is the descriptor value. The CategoryCode column is a derivative of the first column; CategoryCode contains a manufactured code that represents the corresponding long name. For instance, a category long name might be vegetables, but the category code would be VEG. DBAs or data stewards create these category codes. A data steward knows and understands the data stored in the database and constantly monitors the data integrity. Because each code is a derivative of its descriptor, the code inherits the definition and meaning of its associated descriptor value.

In the PRODUCT_CATEGORY table, CategoryLongName is the natural key and CategoryCode is a derivative of the category long name. Although you might hesitate to use the category long name as a primary key (and understandably so, because the category long name is a variable character data type with a maximum length of 50 characters), the category code would be an acceptable candidate for the job. It satisfies all the requirements for a primary key. Because the category code is the only candidate, it becomes the primary key.

Consider the criterion that a primary key value must never change. If one day you decided to differentiate between fresh and frozen vegetables, you could create two new codes that would distinguish fresh veggies, VFR, from frozen veggies, VFZ, while leaving the old code, VEG, intact. You wouldn't have to change any historical records because before you named the two new categories of vegetables, all veggies were classified the same. Fortunately, SQL Server enforces the criterion that a primary key must be a unique identifier for each row in a table by automatically creating a unique index on the primary key column. Without this support from SQL Server, you'd have to confirm in your programs that this code hadn't been used before. What about the criterion that a primary key has to apply to all rows in a table? In this case, because the columns in this table are a descriptor and a derivative of the descriptor, each natural key will always have a value. As for the criterion that a primary key must be minimal, the derivative, CategoryCode, is a character(3) data type. You can't get much more minimal than that.

Apply the Criteria
When people ask me which makes a better primary key—a natural value or a surrogate identifier—I answer that the choice depends on the situation. In all the modeling work that I've done, I've rarely been able to identify a suitable natural key as a candidate for the job. I've been able to identify natural search keys, both unique and nonunique. But when I apply the prerequisite tests—is it unique? does it apply to all rows? is it minimal? is it stable over time?—to these natural keys, most natural keys fail. My personal preference is to use a surrogate key unless I can identify an appropriate natural key that meets the four criteria for the primary key.

End of Article

Prev. page     1 2 [3]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Your font size is too small. My eyesight is good -- corrected to 20/20 with glasses, but I still have to squint.

Also, using IE's "View->Text size->Larger" menu item has no affect. You ignore this value.

Jon Kyle

Very well stated. easy to follow and understand the explanations.

molly arriaga

It was a great help, very descriptive, good examples. Thanks

Aslam Pervez

Almost make no sense to real developer. The whole article explained some thing every developer knows, and not touch real diffrence between Surrogate Key and Natural Key, pros and cons of each,etc.

Anonymous User

'Almost make no sense to real developer. The whole article explained some thing every developer knows, and not touch real diffrence between Surrogate Key and Natural Key, pros and cons of each,etc. '

'Anonymous User- December 03, 2004 '

Did you (Anonymous User) even read the article or did you just skim over the first paragraph or 2? There's no way you can consider yourself a 'real developer' if you read the entire article and still make the statement that you did. Michelle (the articles author) was very clear on what the differences are between the 2 types of keys as well as provide real world examples/descriptions of each. You would of course have to read the whole article to get those details and so if you did not, which I believe is the case, then you should not be criticizing the author in the way you did.

It's also apparent that not all developers 'knows' this material because of they did there would not be numerous postings on the internet groups/forumns about this issue.

Bottom line, if you are going to be critcial then make sure you've reveiwed in full what you are criticizing else you will look foolish to others reading your comments.

EdCarden_SQL

Article Rating 5 out of 5

 
 

ADS BY GOOGLE