Good database design starts with the right primary key

Choosing a primary key is one of the most important steps in good database design. A primary key is a table column that serves a special purpose. Each database table needs a primary key because it ensures row-level accessibility. If you choose an appropriate primary key, you can specify a primary key value, which lets you query each table row individually and modify each row without altering other rows in the same table. The values that compose a primary key column are unique; no two values are the same.

Each table has one and only one primary key, which can consist of one or many columns. A concatenated primary key comprises two or more columns. In a single table, you might find several columns, or groups of columns, that might serve as a primary key and are called candidate keys. A table can have more than one candidate key, but only one candidate key can become the primary key for that table.

Screen 1, page 64, shows seven tables, each with one or two far-left columns underlined. Listing the primary key column(s) first in the table is not necessary, but this is a design and programming standard. The CUSTOMER table has a single-column primary key, CustID. The ADDRESS table has a concatenated primary key, CustID plus AddrType. As you can see from the example data, the set of values is unique for each primary key. Each customer in the CUSTOMER table has a unique CustID that is different from the CustID of any other customer. The ADDRESS table primary key values are also unique, but to see the uniqueness you have to append or concatenate the values from the CustID and AddrType columns into one string. The ADDRESS table has only one billing value and one shipping value.

You can promote a candidate key to primary key when you create the table. In Transact-SQL (T-SQL), the create table command is

CREATE TABLE dbo.SalesPerson 
(   EmpNo  smallint  NOT NULL,
    EmpName  varchar  (25)  NOT NULL 
	 CONSTRAINT  pkeySalesPerson  PRIMARY KEY
	    CLUSTERED (EmpNo) )
GO

You use the CONSTRAINT clause at the end of the CREATE TABLE statement to promote the EmpNo column to primary key. SQL Server creates a unique index on column EmpNo and clusters the table, in accord with the specification PRIMARY KEY CLUSTERED. Similar to a book index, a database table index is a copy of data values arranged in ascending order, and SQL Server uses it for quick look-up and direct access when users query the database. The clustering option ensures that the data in the table is part of the index, which enables quick search and retrieval.

You can also create tables without constraints, then add constraints in a second step:

ALTER TABLE SalesPerson
ADD CONSTRAINT pkeySalesPerson PRIMARY KEY
CLUSTERED (EmpNo)
GO

Each candidate key has a certain set of characteristics that recommends it for the title of primary key. These characteristics are never null, brevity, simplicity, preferred data type, nonidentifying value, and never change.

Never Null
No primary key value can be null, nor can you do anything to render the primary key null. This is an inviolate rule of the relational model as supported by ANSI, of relational database management system (RDBMS) design, and of SQL Server. When you designate a primary key, SQL Server flags as NOT NULL all columns that make up the pkey. Null is an unknown condition. When a table value is null, it means one of several things. It can mean that the value is unknown, that the value isn't relevant, or that you don't know whether the value is relevant.

Note that in the previous CREATE TABLE statement, I created the primary key (the EmpNo column) with a NOT NULL property. A unique index—such as the one SQL Server creates as a result of the primary key CONSTRAINT clause—allows only one instance of null, unless you create the column the index is built on as NOT NULL. If a primary key value were allowed a null value, then you couldn't easily retrieve the row associated with the primary key.

Brevity
Because the SQL Server query processor uses the primary key index for lookups and comparisons, choose a brief primary key—one column, if possible. You use primary key columns for joins (combining data from two or more tables based on common values in join columns), for query retrieval, and for grouping or sorting a query result set. The briefer the index entries are, the faster SQL Server can perform the lookups and comparisons. For example, you can use the primary key column for query retrieval (SELECT * FROM SalesPerson WHERE EmpNo = 101) and for data modification (UPDATE SalesPerson SET EmpName = "Joe Buchanan" WHERE empno = 101). In addition, you can use the primary key column to group or sort a query result set (SELECT * FROM Customer ORDER BY CustID).

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I want to refer to the next piece in the article: "Nonidentifying Value A common mistake among database designers is trying to build intelligence, or meaning, in to the primary key."

I then want to refer you to a document written by Shajan Miah where he deals with the Identity problem and I quote: "Most of the published literature on database systems, when dealing with some of the problems associated with the use of PKs, only adopts a narrow symbol level perspective, i.e. difficulty in maintaining updates, changes, and etc. Very little, if any, consideration is given to the sometimes important role of the PK at the knowledge level (Newell 82, 93). The most important function of the PK is as an interaction element between the real-world and the database. It is thorough the primary key that we usually query the database. The primary key is of fundamental importance if we are to "usefully" relate the concepts of the database to the real world. Here we are going beyond the symbol level (S/L) to the knowledge level (K/L) use of the primary key concept. "An object retains its identity through arbitrary changes to its own state. This is in contrast to the relational model where the properties of an entity must be sufficient for it to distinguish it from all other entities. That is, there must be an attribute or attributes whose values uniquely and immutably identify that entity. The properties of uniqueness and immutability are not always present in the real-world and it is often necessary in relational databases to introduce artificial identifiers for entities."(Hughes, 1993, p.117) (This appears to be a paradoxical statement, as the author also admits that sometimes unique values may not exist!) Hughes, implies in the above statement that this is a limitation of the Relational model, i.e. the introduction of an artificial value to identify some real-world concept. According to Hughes, the OO model does not have this problem as it uses system created OIDs (a bit level concept), some programming pointers to identify the real-world concept. However what Hughes does not consider is that, we do this in everyday life in the real-world we create artificial keys to identify real-world objects. For example the UK registration scheme on cars, the Ministry of Transport deliberately employs an artificial mechanism to identify the cars in the real world (in the UK). This is not done for efficiency purposes (bit & symbol level), but rather for identifying the item (in this case a car) in the real world, and hence is at the knowledge level. Consider another simple example - names are attached (artificially) to new-born children, usually by the family head and are generally (in most cases) unique within the family. These values identify a person within the context of the family. The deliberate creation of keys, also has another importance, from a HCI (Human-Computer Interaction) perspective, in that control of key values is retained by the user.

Date(1995), also notes two similar reasons why PKs are needed. 1. The concept of uniqueness does not apply to some objects (that is unless one goes to the extreme of including details of the physical matter of the object). This as we showed earlier, is applicable to the UK motor industry, where registration number is created to differentiate between cars with similar properties. 2. Standard way of linking the object to the real-world - the control is with the user, in contrast to the OO model, where the system decides. The authors of the Third-Generation Database Manifesto (1990), also argue that "An immutable primary key has an extra advantage over a system-assigned unique identifier because it has natural, human readable meaning. Consequently, in data interchange or debugging this may be an advantage."(p.12). Although, the OID approach to identification used in the OOPar, does overcome some of the symbol level problems associated with PKs, it ignores the sometimes important role of the PK at the K/L and therefore can result in fundamental problems at the K/L. Our claim is not that the OO model cannot cope with the PK concept, but rather that it takes a narrow bit & symbol level perspective, and as a result the concept of PKs is not actively supported by the OO community. Thus, although the OO model claims to eliminate the need for keys (Hughes, 1993) the preceding examples illustrate the importance of the PK concept at the K/L." http://members.aol.com/shaz7862/object/critique.txt

Jeff Deacon

"Consider another simple example - names are attached (artificially) to new-born children, usually by the family head and are generally (in most cases) unique within the family." two words - George Foreman (he named all his sons "George")

Anonymous User

"Now, to be accurate, the old customer 9006CODEN010 should be 9006CODIA010. "

No, check your baggage tags!

As much as one might wish it to be DIA, which how most people refer to the airport facilities, it is still DEN.

JPZonca

Article Rating 1 out of 5

In considering Michelle's opinion in the identity vs. meaningful value debate, I think it's important to remember that Michelle wrote this in 1999, before SQL Server finally got around to supporting DRI (Declarative Referential Integrity) which allows for automatic Cascading Updates and Deletes. As I understand it, all enforcement of Referential Integrity was left up to the database designer to handle them through triggers. If I had to write a trigger just to enforce the integrity of a foreign key relationship, I'd most likely be picking keys that would not need to change too!

My official, bottom-line, opinion of which primary key method to ALWAYS use is, "Use whatever works best in each situation".

Dan W

Article Rating 4 out of 5

 
 

ADS BY GOOGLE