Simplicity
When choosing a primary key, look for candidates with no embedded spaces, special characters, or differential capitalization. If you installed SQL Server to recognize the difference between uppercase and lowercase letters (whether in dictionary order, case-sensitive order, or binary sort order) in a query, avoid primary key candidates that contain mixed-case values because these entries are hard to work with in SQL queries and join statements. Writing "Big Al's Sports Emporium" in dozens of queries each day is more difficult and error-prone than writing "2".

Data Type
Integer (number) data types are the best choice for primary key, followed by fixed-length character data types. SQL Server processes number data type values faster than character data type values because it converts characters to ASCII equivalent values before processing, which is an extra step.

Fixed-length character data types are better than variable-length character data types because SQL Server must decompress variable-length character data before processing the data. This extra step consumes valuable processing power.

Nonidentifying Value
A common mistake among database designers is trying to build intelligence, or meaning, in to the primary key. The most compelling reason not to create a primary key with meaning is that the primary key column you create to be descriptive might become obsolete. For example, you can build an intelligent primary key in the CUSTOMER table in Screen 1.

YYMMSSCCCnnn, where
YY = the last two digits of the year this customer placed its first order,
MM = the two-number month designator which indicates in which month the customer placed its first order,
SS = the two-character code for state where the customer resides,
CCC = the three-character code for city, based on airport codes,
nnn = numbers, from 0001 to 999, allowing for a maximum of 999 customers for each YYSSCCC combination.

A customer from Denver who first ordered from your company in June 1990 initially would have a customer number 9006CODEN010, assuming that this was the tenth customer from Denver. You can then use this intelligent primary key to scan and parse the key value so you can tell when customers started doing business with your company and where the customers are located. You could sort and filter by year or year and month, by state, or by city. But if the customer opens a second office location in a different city or state, should you treat the second office as a separate customer? Or should you continue to use the original customer number for both offices, although this approach will prevent access to the embedded information about the second office? And what happens when you get the 1000th customer in Los Angeles? How much database redesigning and rewriting will you need to expand the nnn portion of the customer number to nnnn? To avoid these difficulties, choose a primary key with a nonidentifying value.

Never Change
After you assign a primary key value, never change it. Returning to the previous example, what happens when a customer moves its operation from New Jersey to Colorado? Do you change the primary key value? How do you handle the primary key when a city builds a new airport, such as happened in Denver? Now, to be accurate, the old customer 9006CODEN010 should be 9006CODIA010. Do you create a new customer number, thus disrupting the history established by this customer, or do you try to change all instances of 9006CODEN010 to 9006CODIA010 in the database? Note that the primary key values repeat from table to table in Screen 1. The CustID column appears in the CUSTOMER table and in the ORDER table. In the CUSTOMER table, CustID is the primary key column. In the ORDER table, CustID is not the primary key but a foreign key. The foreign key sets up an implied link from one table to another. The foreign key CustID of the ORDER table implies a one-to-many (1:M) relationship between the CUSTOMER and ORDER tables. Primary key values in a relational database of any size cascade from one table to another, propelled by the 1:M relationships that are a necessary part of the database. If you change a primary key value, you'll find that you have to change the associated foreign key values in many tables, or you'll lose the informational links built into the database.

Surrogate Keys
Here's one method to ensure that the primary key value is never null, brief, a simple data type, and a nonidentifying value: Create the primary key column as a surrogate key. Each surrogate value represents, or stands in place of, its associated row in a table. The surrogate value is meaningless. The numbering scheme usually starts with 1 for the first row of the table, and increments with each row added to that table. SQL Server has an identity property that you can specify for number data types. When you create a primary key column with the identity property, the primary key column is a surrogate primary key. The surrogate column's values are system-generated, and each value is unique within a table. You can use this code to create a surrogate primary key in T-SQL:

CREATE TABLE dbo.Customer
( CustID  integer
IDENTITY(1,1)
PRIMARY KEY NONCLUSTERED,
   CompanyName  varchar (25)  iNOT NULL )
GO

In this statement, you don't have to specify not null when you create the primary key column. The identity property ensures that each row inserted into the table has a unique, system-generated number for CustID. The numbers start at 1 and increment by 1, as specified. For large-volume, high-transaction databases, you may want to assign a different range of numbers for each table to avoid a potential bottleneck on insert row operations to the database. And, certainly, if you have a horizontally partitioned table in your database, you might want to assign a different, non-overlapping range of numbers for each portion of the table.

May the Best Primary Key Win
You can use the criteria in Table 1 to evaluate each primary key candidate. The candidate that matches all the best answers is the best choice as the table's primary key. If no candidate meets all the best answer criteria, consider creating a surrogate primary key for the table.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I want to refer to the next piece in the article: "Nonidentifying Value A common mistake among database designers is trying to build intelligence, or meaning, in to the primary key."

I then want to refer you to a document written by Shajan Miah where he deals with the Identity problem and I quote: "Most of the published literature on database systems, when dealing with some of the problems associated with the use of PKs, only adopts a narrow symbol level perspective, i.e. difficulty in maintaining updates, changes, and etc. Very little, if any, consideration is given to the sometimes important role of the PK at the knowledge level (Newell 82, 93). The most important function of the PK is as an interaction element between the real-world and the database. It is thorough the primary key that we usually query the database. The primary key is of fundamental importance if we are to "usefully" relate the concepts of the database to the real world. Here we are going beyond the symbol level (S/L) to the knowledge level (K/L) use of the primary key concept. "An object retains its identity through arbitrary changes to its own state. This is in contrast to the relational model where the properties of an entity must be sufficient for it to distinguish it from all other entities. That is, there must be an attribute or attributes whose values uniquely and immutably identify that entity. The properties of uniqueness and immutability are not always present in the real-world and it is often necessary in relational databases to introduce artificial identifiers for entities."(Hughes, 1993, p.117) (This appears to be a paradoxical statement, as the author also admits that sometimes unique values may not exist!) Hughes, implies in the above statement that this is a limitation of the Relational model, i.e. the introduction of an artificial value to identify some real-world concept. According to Hughes, the OO model does not have this problem as it uses system created OIDs (a bit level concept), some programming pointers to identify the real-world concept. However what Hughes does not consider is that, we do this in everyday life in the real-world we create artificial keys to identify real-world objects. For example the UK registration scheme on cars, the Ministry of Transport deliberately employs an artificial mechanism to identify the cars in the real world (in the UK). This is not done for efficiency purposes (bit & symbol level), but rather for identifying the item (in this case a car) in the real world, and hence is at the knowledge level. Consider another simple example - names are attached (artificially) to new-born children, usually by the family head and are generally (in most cases) unique within the family. These values identify a person within the context of the family. The deliberate creation of keys, also has another importance, from a HCI (Human-Computer Interaction) perspective, in that control of key values is retained by the user.

Date(1995), also notes two similar reasons why PKs are needed. 1. The concept of uniqueness does not apply to some objects (that is unless one goes to the extreme of including details of the physical matter of the object). This as we showed earlier, is applicable to the UK motor industry, where registration number is created to differentiate between cars with similar properties. 2. Standard way of linking the object to the real-world - the control is with the user, in contrast to the OO model, where the system decides. The authors of the Third-Generation Database Manifesto (1990), also argue that "An immutable primary key has an extra advantage over a system-assigned unique identifier because it has natural, human readable meaning. Consequently, in data interchange or debugging this may be an advantage."(p.12). Although, the OID approach to identification used in the OOPar, does overcome some of the symbol level problems associated with PKs, it ignores the sometimes important role of the PK at the K/L and therefore can result in fundamental problems at the K/L. Our claim is not that the OO model cannot cope with the PK concept, but rather that it takes a narrow bit & symbol level perspective, and as a result the concept of PKs is not actively supported by the OO community. Thus, although the OO model claims to eliminate the need for keys (Hughes, 1993) the preceding examples illustrate the importance of the PK concept at the K/L." http://members.aol.com/shaz7862/object/critique.txt

Jeff Deacon

"Consider another simple example - names are attached (artificially) to new-born children, usually by the family head and are generally (in most cases) unique within the family." two words - George Foreman (he named all his sons "George")

Anonymous User

"Now, to be accurate, the old customer 9006CODEN010 should be 9006CODIA010. "

No, check your baggage tags!

As much as one might wish it to be DIA, which how most people refer to the airport facilities, it is still DEN.

JPZonca

Article Rating 1 out of 5

In considering Michelle's opinion in the identity vs. meaningful value debate, I think it's important to remember that Michelle wrote this in 1999, before SQL Server finally got around to supporting DRI (Declarative Referential Integrity) which allows for automatic Cascading Updates and Deletes. As I understand it, all enforcement of Referential Integrity was left up to the database designer to handle them through triggers. If I had to write a trigger just to enforce the integrity of a foreign key relationship, I'd most likely be picking keys that would not need to change too!

My official, bottom-line, opinion of which primary key method to ALWAYS use is, "Use whatever works best in each situation".

Dan W

Article Rating 4 out of 5

 
 

ADS BY GOOGLE