A properly normalized database (at least third normal form3NF) won't produce these insertion, update, and deletion anomalies. (For information about normalization levels, see SQL by Design, "Why You Need Database Normalization," March 1999, InstantDoc ID 4887. For a brief discussion of when denormalizaton is the best choice, see the Web sidebar "When Not to Normalize Your Database," at http://www.sqlmag.com, InstantDoc ID 23893.) As you reduce the level of normalization from third to second or even to first normal form (1NF), you'll have to account for the data-integrity violations that happen as a result of the denormalization process. To retain the data integrity, you'll have to create compensating routines in every program that modifies the data in these tables.
Normalizing the Cellar Database
When I perform a user-schema evaluation, I use a checklist to guide me through the detailed steps that I have to remember. You might want to consider adapting this checklist for your own use. (For a copy of the checklist, see the sidebar "User-Schema Evaluation Checklist.") To look at the way I proceed through the checklist, let's use an example database called Cellar. Despite its small size and limited user base, this example is useful for demonstrating how to normalize a database. I explain the process I use to audit this database. You can follow along with me as I analyze Cellar by executing Web Listing 1 and Web Listing 2, which create and load the Cellar database. To download the Web listings in this article, enter InstantDoc ID 23745 at the SQL Server Magazine Web site (http://www.sqlmag.com) and click Download the code.
Set the context. Imagine that the client is a wealthy person who lives in a large home with a climate-controlled wine cellar in the basement. As an aficionado of fine wines, the client has friends around the world with whom he shares his love of wine, and when he travels, he meets with some of those friends to sample a wine. The client uses the Cellar database to keep track of these wine tastings and to track how to get in touch with his friends. He records in the Cellar database the wines served at each wine tasting and who came to each wine tasting. The client is the only person who uses the Cellar database.
This brief description tells me what the Cellar database is, who uses it, and why it exists. This context helps me move on to the next step in the analysis: diagramming the database's physical design.
Create a physical design. Figure 3, page 34, shows a physical diagram that I created in Enterprise Manager. The diagram shows five tablesCELLAR, TASTEWINE, TASTING, TASTEGROUP, and VISITORthe table relationships, the fields and data types, and each table's primary key. I can infer from the physical design that the CELLAR and TASTING tables are in a many-to-many (M:N) relationship (you can serve a wine that you store in Cellar at more than one tasting, and a tasting can feature more than one wine from Cellar). I can also infer that TASTING and VISITOR are involved in a M:N relationship (a tasting can include one or more visitors, and a visitor can attend one or more tastings). TASTEWINE and TASTEGROUP are associative tables. At first glance, this database looks well normalized.
List ambiguous tables. The physical diagram and the client's description of the use and purpose of the Cellar database help me understand most of the tables. The TASTING table contains a list of dates and places where the wine tastings occurred (or will occur). The VISITOR table is a list of wine-tasting group members and their phone numbers. TASTEWINE and TASTEGROUP are associative tables. The only table I don't quite understand is the CELLAR table; it contains a list of wine types, names, producers, years, and costs. But how do these fields relate to the primary key, Bin? After questioning the client further, I find out that the numbers in Bin represent storage containers in the wine cellar. Each container holds one type of wine. A type of wine is defined as one wine made by one producer in one year. Figure 4, page 34, which is part of the CELLAR table, shows this organization (one type of wine listed in each row of the table).
I add the CELLAR table to the list of objects that I need to review more closely. Whenever I see a table that includes more than one type of data, I get suspicious. The CELLAR table lists wine types as well as where the wines are stored. I think that CELLAR is a candidate for further decomposition.
Find hidden arrays. A hidden array is a list or a collection of values that's stored as a series of simple attributes in a single table. The members of the list or collection are separate columns in the table, each with its own column name. Examples of hidden arrays are days of the week (i.e., Sunday through Saturday), different phone numbers (e.g., work, home, other), and different addresses (e.g., billing, shipping, other). Another term for a hidden array is multivalued attribute. (For a full explanation of multi-valued attributes, see Solutions by Design, "Multivalued Attributes," page 51.)
List ambiguous field names. In the CELLAR table, YearProduced is the four-digit year in which each wine was made. To fully understand this field, you have to know a few facts about wine making. Wine is pressed in the fall of the year, immediately following the grape harvest. In the Northern Hemisphere, the grape harvest might run from September to November, depending on the vineyard's location and the type of grape. In the Southern Hemisphere, the grape harvest is in March, April, or May. Knowing this information helps you calculate the age and drinkability of a winea Northern Hemisphere wine produced in 2000 will be 1.5 years old by the time you read this article, but a Southern Hemisphere wine will be 2 years old.
Prev. page
1
[2]
3
4
next page