Surrogate vs. Natural Primary Keys
I'm not quite happy with Michelle Poolet's
answer to John Wells ("A Primary Key for
Every Table?" October 2006, InstantDoc
ID 93001), who asked whether a table must
have a primary key in an example in which
the natural key was five columns, some of
them null. Michelle said that the primary
key would have the virtue of preventing
duplicates, but John had said he already had a
unique index on the five. Michelle suggested
an identity (surrogate key) "internally" and
the natural key "for the user interfaces." But
many "internal operations" (what isn't an
internal operation?) will be generated by the
interface—delete, insert, update—and these
will create exactly the kind of multi-clause
SQL that she deprecates—unless you show
the user the surrogate key (under some such
heading as "item number") and let him
or her pick it directly. Only then will the
advantage of faster index access counterbalance the extra work of maintaining the
primary key index as well as the existing
unique index.
I would say that if the usage of the table
involves heavy updating, then the current
unique index will be more efficient. It is
really functioning as a primary key—it just
can't be called that. But if the usage of the
table is mostly retrieval, then the surrogate
key is a good idea. And if the five-column
table is itself a parent to any child tables
that point back to it by foreign key, then
of course you must have a primary key.
In such cases, even if the five fields didn't
include nulls and could be a real primary
key, the 4-byte identifier would be much
more efficient. One might note that any
table that has nullable "key" elements might
not be optimally designed. Looks like a case
for normalization into some separate tables.
Where I find some writers flat-out wrong
is when they recommend that every table
should have a surrogate key, especially when
they say just "to ensure uniqueness." That
can too easily be a cop-out, avoiding the design work of normalization. Every table
in a well-designed transactional database
has a natural key, possibly selected at design
time from a small set of candidate keys, after
normalization to at least third normal form,
and the structure is clearer and simpler (and
usually more efficient) when that natural
key is used as the primary key. The only
reason to use a surrogate key is when the
natural key doesn't work (as above) or is
operationally inefficient, or when there is
no convenient natural key—for example, an
audit log where the nearest thing to a natural
key is a date and time down to the second,
but you could legitimately get two records
in the same second.
Take a simple table such as a typical
"Order Detail" where the natural key is
Order Number plus Line Number. This will work perfectly well. If you add an identity
column, all you've done is slowed down
insert\update\delete operations because
you now have an additional index. You
need to keep an order number/line number
index to guard against duplicates, but you
have added the new primary key index on
the identity, quite unnecessarily. Effectively,
you've added a data element out of nowhere
with no real role. So in summary, surrogate
keys are useful but should never be one's first
choice.
—John Bonavia
SQL Server 2005 Hardware Guidelines
I have a question for Michael Otey: Has SQL Server Magazine ever published hardware
guidelines for SQL Server 2005 x64? My company is in the process of upgrading from
SQL Server 2000, and we've spoken to many hardware vendors. Some indicate that we
need two dual-core processors with 32MB of RAM, and others state that we need four
dual-core processors with 64MB of RAM. We're trying to determine what best fits our
needs. Can Michael help?
—Michael Watson
Unfortunately, I don't know of any hardware upgrading sizing guides, and we haven't published
guides like this in SQL Server Magazine. Proper sizing is tough without specific information
about the workload and response times needed. In addition, you need to consider future processing
requirements. You should be able to get an idea about your current CPU and memory requirements
by checking out your current SQL Server system's PerfMon performance counters at various points
during the day. As a general rule of thumb, if in doubt, it's better to go bigger rather than smaller. This
may result in unused capacity, but better that than inadequate performance from a new system.
—Michael Otey
For further information about 64-bit SQL Server, check out these articles: Select Top(X): "64-Bit
Advantages," InstantDoc ID 49937; "Riding the x64 Wave," InstantDoc ID 49713; "64-Bit
Vs. 32-Bit Memory Management," InstantDoc ID 44557.
—Dianne Russell