The Database Platform
The kind of database platform you work from can help determine whether you need to use a natural value or a surrogate identifier for your primary key. The criterion that a primary key must be minimal means the fewer the columns, the better. A single-column primary key simplifies data storage, retrieval, and coding. SQL Server automatically creates a unique index (and, by default, a clustered index) on any column designated as a primary key. In Figure 1's Purchase table example, if the primary key were CustID plus PurchaseDate, SQL Server would create a 12-byte index key. If you created a surrogate key with an integer data type for the Purchase table's primary key, the index key would be one-third the size. Eight bytes here and 8 bytes there don't sound like much until you start calculating the additional I/O required for reading and writing hundreds of thousands of records. If the records are short, you can fit more records on a page that's being read into memory or written to disk. Here's a general rule: The more records that can fit on a page, the better the performance.
SQL Server 2000 and 7.0 provide additional support for a brief primary key in the way they build indexes. In SQL Server 6.5 and earlier, the nonclustered indexes use row pointers to point to the actual data record. So, the leaf level in a SQL Server 6.5 nonclustered index contains index keys and associated data-page or row IDs that point to the referenced data row in the user table. Each entry in a nonclustered index page is approximately the length of the nonclustered index key plus 6 bytes for the page or row pointer.
The nonclustered indexing model changed in SQL Server 7.0. Now, in SQL Server 2000 and 7.0, for each clustered table that also has nonclustered indexes, the nonclustered index key and the relevant clustered index key reside at the leaf level of the nonclustered index. As Figure 2 shows, the nonclustered index points to the clustered index. As a result, if the clustered index key is long, the nonclustered indexes in SQL Server 2000 and 7.0 can be very large, thereby increasing your database storage requirements and possibly decreasing database performance. This negative effect occurs if you decide to cluster on the primary key and you choose a long character or variable-length character field (or worse yet, a concatenation of several fields) for the primary key. In the SQL Server world, if you're planning to cluster by the primary key, you're better off with short, single-column primary keys. For more information about choosing a primary key's data type, see the sidebar "Make It Short and Sweet."
The Applications Development Group
In organizations in which database development is driven by a group whose primary concern is application development, a natural key is considered the only key to use for a table's primary key. Natural keys are meaningful; hence, they're easy to remember and easy to write code around. This group is making one of the most common mistakes that data modelers make when selecting a primary keyconfusing a search key with a primary key. The primary key's purpose is to be used internally for distinguishing one row from another. You can use the primary key as a search key, but using a natural key for searching is usually easier. For example, a Customer table's primary key might be called CustID, which is a 16-digit surrogate key. Although CustID uniquely identifies each row in the Customer table, you don't expect customers to be able to identify themselves by their customer ID. Instead, customers give you their name, phone number, and perhaps a residence or mailing address. This combination of attributes is a search key.
So what's the answer? Is the surrogate key the better choice for primary key? By definition, the surrogate key meets all the criteria for a good primary key. Can you ever use a natural keyor a derivative of the natural keyfor a table's primary key? If so, under what circumstances can you use it? Figure 3 shows a scenario in which a natural key fits comfortably as a primary key. This scenario showcases a Purchase model that contains an interesting variation: The PRODUCT_CATEGORY table has a primary key called CategoryCode, which is a 3-byte character data type. PRODUCT_CATEGORY is a reference table (also called a lookup table or a list-of-values table). You use reference tables to modify and describe other database tables. You can enforce referential integrity between the reference table and the table it's modifying, and in so doing, you help maintain domain integrity.
Prev. page
1
[2]
3
next page