Use it to ensure table integrity

In a previous column, I talked about primary keys ("How to Choose a Primary Key," April 1999). This month, I cover the foreign key. A foreign key is an integral part of a relational database design. It establishes relationships between tables, and it makes possible the procedures that cross-reference data stored in separate tables in the database. It enforces data integrity rules and prohibits modifications that might compromise data integrity. The foreign key is as important to the database design as the primary key, and together they control updates to the data. So let's look at what a foreign key is, what you use it for, and what value it brings to your database design.

What Is a Foreign Key?
A foreign key is a table column that establishes a link from the table it resides in to the primary key or a candidate key in another, related table. The foreign key is the anchor on the many side of a one-to-many (1:M) relationship, much as the primary or candidate key is the anchor on the one side of this relationship.

A foreign key is the linchpin that makes sure you don't enter invalid data into a table. It also prohibits you from doing a delete or update operation that might leave orphan rows. The database development community refers to this feature as declarative referential integrity, and it can't function without foreign keys.

Figure 1 is an entity relationship diagram (ERD) representing three tables (Project, Employee, and Paycheck) and two 1:M relationships. Project contains attributes describing the kinds of workplace projects an employee might be assigned to, Employee contains attributes describing individual employees, and Paycheck contains attributes describing employee compensation. To demonstrate a point, I've simplified the relationships and restricted the business rules. Each Project can have many Employees assigned to it, but an Employee is associated with only one Project at a time. Each Employee gets many Paychecks, but each Paycheck goes to only one Employee.

In Paycheck, the attribute EmpNo is a foreign key (FK). EmpNo in Paycheck has the same meaning and is the same data type and length as EmpNo in Employee. In fact, the set of real data values for EmpNo must be the same in both Employee and Paycheck to ensure that you never cut a paycheck for a nonexistent employee. This relationship creates a referencing link between Paycheck and Employee that you can enforce and exploit when you write SQL queries, as you'll see later in this article.

Figure 2, page 62, is an implementation of the Project-Employee-Paycheck ERD, in which I've converted each entity to a table, and each attribute has become a column in a table. I populated the tables with test data to demonstrate the concept of foreign keys.

Column ProjNo in the Employee table anchors the 1:M relationship between Project and Employee on the many side. In this model, the arrow points from the foreign key to the primary key, implying a foreign key reference from Employee.ProjNo to Project.ProjNo. This arrangement means that the project number in the Employee table references the project number in the Project table. So in Figure 2, both Abby and Don are assigned to work on the Y2K Remediation Project.

Dependent Relationships
Not all 1:M relationships are created equal. Notice that in both Figures 1 and 2, some relationship lines are dotted and some are solid. The solid connector line represents the dependent, or identifying, 1:M relationship. A dependent relationship means each record on the many side of the relationship depends for its existence on a related record on the one side.

The dotted connector line represents the independent, or nonidentifying, relationship, which is the condition between Project and Employee. An independent 1:M relationship means that a record may exist on the many side of the relationship without a related record on the one side. An Employee doesn't need to be working on a Project in order to be an employee. In Figure 2, Ethyl Etheridge is not connected to a project, but she still gets a paycheck.

The terms identifying relationship and non-identifying relationship refer to how the primary key of the table on the many side is supposed to be constructed. The theory is that for identifying relationships, the primary key of the one table cascades down to the many table, and is concatenated to the identifier of the many table to form the primary key. Many CASE tools that you can use to draw ERDs implement this theory as a rule that you can't easily circumvent.

You can enforce the identifying relationship two ways. You can declare the relationship when you initially create the table. Or you can declare this relationship after you create the table, with an ALTER TABLE statement, as in:

ALTER TABLE paycheck
  ADD CONSTRAINT fk_Paycheck2Emp FOREIGN
    KEY (EmpNo) 
    REFERENCES Employee(EmpNo)
GO

The nonidentifying relationship doesn't have a restriction about cascading the primary key. Each table has its own unique identifier; neither table inherits part of its primary key from the other. The Project table primary key, ProjNo, is a foreign key in the Employee table. ProjNo in the Employee table can be null, as in the case of Ethyl Etheridge, who is not assigned to a project.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

MySQL doesn't support FOREIGN KEY yet (except InnoDB). They plan to in version 5.1.

Paul Lammertsma

Denormalizing data to make it more accessible to users is not required. Just create views instead.

bubbagump

Article Rating 2 out of 5