Repeating groups are cells that have more than one occurrence. In a programming language, this concept is an array. For instance, if this database supported repeating groups (which it does not because it is a relational database), you would see a single row for this order with a repeating group for ProductName and QuantityPurchased. The set of these two columns would occur five times for this one order, once for each product purchased, thereby minimizing the redundancy in the new version of table AllData. This minimalism might work for some nonrelational database architectures and file-processing schemes, but the relational model precludes having repeating groups.
After you designate a pkey, table AllData will be in 1NF. In the description of pkeys, I suggested that a pkey for the AllData table is CustID + OrderID + ProductName. Let's designate that combination as the concatenated pkey.
Toward 2NF
2NF is a condition of full functional dependency on the whole pkey; the pkey must determine each non-pkey attribute. 1NF requires that a table have a pkey, and we have designated the combination of CustID + OrderID + ProductName for that role. To test for functional dependency, let's see whether the pkey determines each non-pkey attribute.
For each non-key attribute, you proceed as follows. What determines the CompanyName? One of our business rules says that each company has a Customer ID (CustID), and the CustID represents the company and each of its related attributes (CompanyName, CustomerContact, ContactPhone). However, in table AllData, does CustID + OrderID + ProductName determine CompanyName? Does CompanyName depend on what it bought and when? No. Therefore, CompanyName is not fully functionally dependent on the whole pkey.
As you test each non-key attribute against the known business rules, you can see that CustID defines some non-key attributes, OrderID defines other non-pkey attributes, and ProductName defines still other non-pkey attributes. 2NF says that all non-pkey attributes must be fully functionally dependent on the whole pkey. You must modify table AllData to make it 2NF.
If you created three tables, each of which had as its pkey a single attribute of the AllData concatenated pkey, you would have at least part of AllData in 2NF. The solution would look like Screen 3.
The new Customer table has greatly reduced the redundant data present in table AllData. The new Order table still has a high level of redundancy, which we can correct by further decomposition. We have completely normalized the new Product table.
Determining 3NF
You achieve 3NF when you have resolved all transitive dependencies. Once again, you'll have to test the attributes in each table, but this time you test to see whether, within a table, any non-key attribute determines the value of another non-key attribute. Such a determination defines transitive dependency. A transitive dependency causes additional redundancy, which Screen 3 illustrates in the Order table.
Let's start with the Order table to analyze transitive dependencies. One business rule states that each order will have a unique order identifier. An order occurs when a customer purchases one or many products on a given day, at a given time. Therefore, attribute OrderDate is fully functionally dependent on OrderID. But what determines ShippingDate or ShippingMethod? Does OrderID determine the product and the shipping destination? The business rules will have to answer all these questions. For instance, OrderDate might affect ShippingDate. Having the ordered product (ProductName) in stock might also affect ShippingDate. A combination of OrderID and ProductName affect QuantityPurchased. OrderID and CustID affect the shipping address attributes (ShipAddr, ShipCity, ShipState, ShipZip, and ShipCountry).
The Customer table includes some transitive dependencies. The table recognizes a business rule that determines whether the customer is the company (CompanyName attribute). But, does CustID determine the CustomerContact? What if this company has more than one CustomerContact on file? If so, do you need to repeat all the billing address data for the second and third contact? Your company can institute a rule that allows only one contact person per customer, but from a salesperson's perspective, this rule would be restrictive. The salesperson and the retailer want to sell product and services. Why would they want a rule that would hamper this goal?
Screen 4, page 70, is a 3NF version of AllData, because each of the tables in Screen 4 meets the criteria for 3NF:
- Each table is a flat file, or spreadsheet format, with all-atomic data items, no repeating groups, and a designated pkey.
- Each table has all non-pkey attributes fully functionally dependent on the whole pkey.
- All transitive dependencies are removed from each table.
You still have to cross-reference the data from one table to the data in another table. Using cross-referencing, adding the second order to the Order table will let you know what that order included (OrderDetail table).
Normalizing the Database
Now that you have decomposed the AllData table into seven smaller tables, you need to cross-reference the seven tables. You have reduced the level of data redundancy and can now fit more rows of any one table on a single block for physical reads and writes. However, what good is this organization if you have no way of relating one table to another?
In the process of reorganizing the data into the new set of tables, you reviewed the business rules. The business rules define data interrelationships:
- A CUSTOMER has many associated CUSTOMER_CONTACTs, but a CUSTOMER_CONTACT works for only one CUSTOMER at a time (1:M, CUSTOMER:CUSTOMER_CONTACT).
- A CUSTOMER can have as many ADDRESSes on file as necessary; each ADDRESS relates to one and only one CUSTOMER (1:M, CUSTOMER: ADDRESS).
- A CUSTOMER can place many ORDERs; each ORDER points back to one and only one CUSTOMER (1:M, CUSTOMER:ORDER).
- A SALESPERSON is responsible for many ORDERs; each ORDER is credited to one SALESPERSON (1:M, SALESPERSON:ORDER).
- Each ORDER can contain one or many ORDER_DETAILs (items ordered); each ORDER_ DETAIL relates back to one ORDER (1:M, ORDER: ORDER_DETAIL).
- A PRODUCT can be a participant in many ORDER_DETAILs; each ORDER_DETAIL points back to one and only one PRODUCT (1:M, PRODUCT:ORDER_DETAIL).
For each 1:M relationship, you take the pkey of the one and embed it as an fkey in the table of the many. Screen 5 shows the result. In some of these tables, I've concatenated the fkey to the existing pkey to enhance the design flexibility and functionality.
You can analyze each table independently of all others in the database, and then deduce a normal form for that table. However, the success of the database design and normalization hinges on what kind of relationship each table has to each other table, and on the correct expression of each relationship.
If you ensure that each table is in 3NF, you avoid problems that can arise when users update data. However, look at all data attributes across the database, and evaluate the normal form of the entire database. You must make sure that you've stored each non-pkey attribute only once to remove all redundant data and to remove the possibility of unsynchronized data, which can damage data recovery.
Accuracy and Performance
A poorly normalized database and poorly normalized tables can cause problems ranging from excessive disk I/O and subsequent poor system performance to inaccurate data. An improperly normalized condition can result in extensive data redundancy, which puts a burden on all programs that modify the data.
From a business perspective, the expense of bad normalization is poorly operating systems and inaccurate, incorrect, or missing data. Applying normalization techniques to OLTP database design helps create efficient systems that produce accurate data and reliable information.
End of Article
Prev. page
1
[2]
next page -->