Surrogate vs. Natural Primary Keys
I'm not quite happy with Michelle Poolet's answer to John Wells ("A Primary Key for Every Table?" October 2006, InstantDoc ID 93001), who asked whether a table must have a primary key in an example in which the natural key was five columns, some of them null. Michelle said that the primary key would have the virtue of preventing duplicates, but John had said he already had a unique index on the five. Michelle suggested an identity (surrogate key) "internally" and the natural key "for the user interfaces." But many "internal operations" (what isn't an internal operation?) will be generated by the interface—delete, insert, update—and these will create exactly the kind of multi-clause SQL that she deprecates—unless you show the user the surrogate key (under some such heading as "item number") and let him or her pick it directly. Only then will the advantage of faster index access counterbalance the extra work of maintaining the primary key index as well as the existing unique index.

I would say that if the usage of the table involves heavy updating, then the current unique index will be more efficient. It is really functioning as a primary key—it just can't be called that. But if the usage of the table is mostly retrieval, then the surrogate key is a good idea. And if the five-column table is itself a parent to any child tables that point back to it by foreign key, then of course you must have a primary key. In such cases, even if the five fields didn't include nulls and could be a real primary key, the 4-byte identifier would be much more efficient. One might note that any table that has nullable "key" elements might not be optimally designed. Looks like a case for normalization into some separate tables.

Where I find some writers flat-out wrong is when they recommend that every table should have a surrogate key, especially when they say just "to ensure uniqueness." That can too easily be a cop-out, avoiding the design work of normalization. Every table in a well-designed transactional database has a natural key, possibly selected at design time from a small set of candidate keys, after normalization to at least third normal form, and the structure is clearer and simpler (and usually more efficient) when that natural key is used as the primary key. The only reason to use a surrogate key is when the natural key doesn't work (as above) or is operationally inefficient, or when there is no convenient natural key—for example, an audit log where the nearest thing to a natural key is a date and time down to the second, but you could legitimately get two records in the same second.

Take a simple table such as a typical "Order Detail" where the natural key is Order Number plus Line Number. This will work perfectly well. If you add an identity column, all you've done is slowed down insert\update\delete operations because you now have an additional index. You need to keep an order number/line number index to guard against duplicates, but you have added the new primary key index on the identity, quite unnecessarily. Effectively, you've added a data element out of nowhere with no real role. So in summary, surrogate keys are useful but should never be one's first choice.
—John Bonavia

SQL Server 2005 Hardware Guidelines
I have a question for Michael Otey: Has SQL Server Magazine ever published hardware guidelines for SQL Server 2005 x64? My company is in the process of upgrading from SQL Server 2000, and we've spoken to many hardware vendors. Some indicate that we need two dual-core processors with 32MB of RAM, and others state that we need four dual-core processors with 64MB of RAM. We're trying to determine what best fits our needs. Can Michael help?
Michael Watson

Unfortunately, I don't know of any hardware upgrading sizing guides, and we haven't published guides like this in SQL Server Magazine. Proper sizing is tough without specific information about the workload and response times needed. In addition, you need to consider future processing requirements. You should be able to get an idea about your current CPU and memory requirements by checking out your current SQL Server system's PerfMon performance counters at various points during the day. As a general rule of thumb, if in doubt, it's better to go bigger rather than smaller. This may result in unused capacity, but better that than inadequate performance from a new system.
—Michael Otey

For further information about 64-bit SQL Server, check out these articles: Select Top(X): "64-Bit Advantages," InstantDoc ID 49937; "Riding the x64 Wave," InstantDoc ID 49713; "64-Bit Vs. 32-Bit Memory Management," InstantDoc ID 44557.
—Dianne Russell




You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Surrogate vs. Natural Primary Keys I don't want to start a yes/no comment session on this article, so I'll safely remain in the middle with the opinion, 'it depends on your situation'. We should consider this a strange answer for an exact science of ones and zeros.

I want to bring to your attention a risk related solely to using a natural primary key.

Scenario 1 The business operations changes and your natural primary key stops being a primary key. If you're trusting that a field will be a primary key forever you're probably looking towards a massive application change. With surrogate PKs chances are you will not have to change anything.

Scenario 2 Assume you're using product codes as primary keys. Your company decides to rename the product codes to standardize worldwide. If you're trusting on the product code as a primary key you will have to change every table containing that product code. You will also run in relational trouble since PK-FK relations will not allow you to just change one table at a time. With surrogate PKs you have to change only the main product table. (disclaimer: Not considering historical data and the need to keep old product codes)

My conclusion In this ever changing world, you're never sure that a field will remain a primary key for ever. For this reason, by default, I use surrogate identity fields. If you really want to play safe you should even use guid fields as a surrogate PK to be prepared for decentralization scenarios.

ndg@denco.be

Article Rating 3 out of 5