The Database Platform
The kind of database platform you work from can help determine whether you need to use a natural value or a surrogate identifier for your primary key. The criterion that a primary key must be minimal means the fewer the columns, the better. A single-column primary key simplifies data storage, retrieval, and coding. SQL Server automatically creates a unique index (and, by default, a clustered index) on any column designated as a primary key. In Figure 1's Purchase table example, if the primary key were CustID plus PurchaseDate, SQL Server would create a 12-byte index key. If you created a surrogate key with an integer data type for the Purchase table's primary key, the index key would be one-third the size. Eight bytes here and 8 bytes there don't sound like much until you start calculating the additional I/O required for reading and writing hundreds of thousands of records. If the records are short, you can fit more records on a page that's being read into memory or written to disk. Here's a general rule: The more records that can fit on a page, the better the performance.

SQL Server 2000 and 7.0 provide additional support for a brief primary key in the way they build indexes. In SQL Server 6.5 and earlier, the nonclustered indexes use row pointers to point to the actual data record. So, the leaf level in a SQL Server 6.5 nonclustered index contains index keys and associated data-page or row IDs that point to the referenced data row in the user table. Each entry in a nonclustered index page is approximately the length of the nonclustered index key plus 6 bytes for the page or row pointer.

The nonclustered indexing model changed in SQL Server 7.0. Now, in SQL Server 2000 and 7.0, for each clustered table that also has nonclustered indexes, the nonclustered index key and the relevant clustered index key reside at the leaf level of the nonclustered index. As Figure 2 shows, the nonclustered index points to the clustered index. As a result, if the clustered index key is long, the nonclustered indexes in SQL Server 2000 and 7.0 can be very large, thereby increasing your database storage requirements and possibly decreasing database performance. This negative effect occurs if you decide to cluster on the primary key and you choose a long character or variable-length character field (or worse yet, a concatenation of several fields) for the primary key. In the SQL Server world, if you're planning to cluster by the primary key, you're better off with short, single-column primary keys. For more information about choosing a primary key's data type, see the sidebar "Make It Short and Sweet."

The Applications Development Group
In organizations in which database development is driven by a group whose primary concern is application development, a natural key is considered the only key to use for a table's primary key. Natural keys are meaningful; hence, they're easy to remember and easy to write code around. This group is making one of the most common mistakes that data modelers make when selecting a primary key—confusing a search key with a primary key. The primary key's purpose is to be used internally for distinguishing one row from another. You can use the primary key as a search key, but using a natural key for searching is usually easier. For example, a Customer table's primary key might be called CustID, which is a 16-digit surrogate key. Although CustID uniquely identifies each row in the Customer table, you don't expect customers to be able to identify themselves by their customer ID. Instead, customers give you their name, phone number, and perhaps a residence or mailing address. This combination of attributes is a search key.

So what's the answer? Is the surrogate key the better choice for primary key? By definition, the surrogate key meets all the criteria for a good primary key. Can you ever use a natural key—or a derivative of the natural key—for a table's primary key? If so, under what circumstances can you use it? Figure 3 shows a scenario in which a natural key fits comfortably as a primary key. This scenario showcases a Purchase model that contains an interesting variation: The PRODUCT_CATEGORY table has a primary key called CategoryCode, which is a 3-byte character data type. PRODUCT_CATEGORY is a reference table (also called a lookup table or a list-of-values table). You use reference tables to modify and describe other database tables. You can enforce referential integrity between the reference table and the table it's modifying, and in so doing, you help maintain domain integrity.

Prev. page     1 [2] 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Your font size is too small. My eyesight is good -- corrected to 20/20 with glasses, but I still have to squint.

Also, using IE's "View->Text size->Larger" menu item has no affect. You ignore this value.

Jon Kyle

Very well stated. easy to follow and understand the explanations.

molly arriaga

It was a great help, very descriptive, good examples. Thanks

Aslam Pervez

Almost make no sense to real developer. The whole article explained some thing every developer knows, and not touch real diffrence between Surrogate Key and Natural Key, pros and cons of each,etc.

Anonymous User

'Almost make no sense to real developer. The whole article explained some thing every developer knows, and not touch real diffrence between Surrogate Key and Natural Key, pros and cons of each,etc. '

'Anonymous User- December 03, 2004 '

Did you (Anonymous User) even read the article or did you just skim over the first paragraph or 2? There's no way you can consider yourself a 'real developer' if you read the entire article and still make the statement that you did. Michelle (the articles author) was very clear on what the differences are between the 2 types of keys as well as provide real world examples/descriptions of each. You would of course have to read the whole article to get those details and so if you did not, which I believe is the case, then you should not be criticizing the author in the way you did.

It's also apparent that not all developers 'knows' this material because of they did there would not be numerous postings on the internet groups/forumns about this issue.

Bottom line, if you are going to be critcial then make sure you've reveiwed in full what you are criticizing else you will look foolish to others reading your comments.

EdCarden_SQL

Article Rating 5 out of 5

 
 

ADS BY GOOGLE