5. Failing to Use Supertypes and Subtypes
Data modelers use supertype and subtype entities to distinguish the different categories of an entity, such as the PERSON entity and its subtypes EMPLOYEE and AUTHOR, as Web Figure 1 (available online) shows. When you're modeling, you need to demonstrate the differences between an employee and an author, and you have to show which attributes modify each of the three entities (PERSON, EMPLOYEE, AUTHOR). The supertype/subtype paradigm forces you to identify the attributes and relationships that interact with both the supertype and the subtypes. The common attributeswhich in Web Figure 1 are PersonID, FirstName, MInit, LastName, Address, Phone, and Emailmodify the supertype entity PERSON. You need to record these attribute values for all peoplewhether the person is an employee or an author. Some relationships involve only the supertype, rather than the subtypes. For example, the PERSON entity is involved in the relationship PERSON_PUBLISHER. The relationship shows you that all peopleemployees and authors alikework for a publisher.
For each subtype entity, you have to identify the attributes or relationships that are specific to that one subtype. In Web Figure 1, an employee is involved in publication activities and also has specific skill sets. The author writes articles and receives royalties.
If you need to represent categories of an entity in your design but you fail to use the supertype/subtype structure to help analyze the data requirements for your database, you risk incomplete understanding of your data. You also risk building modification anomalies into your database. For instance, in Web Figure 1, a person can be an employee of a publishing house and an author for that firm at the same time. If you didn't use the supertype/subtype structure in this design (i.e., if you included only an EMPLOYEE entity and an AUTHOR entity), duplicate data about one person who is both employee and author would be stored in the EMPLOYEE table as well as the AUTHOR table. This duplication would lead to non-key data redundancy within the database, which can result in insert, update, and delete anomalies that cause unsynchronized data and loss of data integrity.
6. Failing to Recognize a Multivalued Attribute
Failing to recognize and resolve a multivalued attribute might be one of the most common mistakes a data modeler can make. As I've said before, a multivalued attribute contains a list or a collection of values for each instance of that attribute. Typically, neophyte designers treat these lists or collections as separate attributes of an entity or separate columns in a table, as the SERVER table (entity) in Web Figure 2 shows. This flattened design might initially be easy to program against (it's especially compatible with the detail report that displays the general system configuration), but this same design is difficult to manipulate for summary data and is counterproductive when you manage or update data in the database. A multivalued attribute needs to become a child table in which each of the multiple values becomes a row in the child table and each row of the child table contains a foreign key that links to the parent table's primary key. In Web Figure 3, the data model for the transitions from multivalued attributes to child tables, the multivalued attributes in Web Figure 2 are broken out into the DISK_DEVICES and INSTALLED_COMPONENT child tables.
7. Failing to Model to 3NF
Failing to model to third normal form (3NF) at the conceptual design stage might be a controversial point in this article, especially to working database professionals who are pulled into database design without background in data-modeling theory. To understand a data model's business requirements and the data items' relationships, you must model to 3NF at the conceptual stage. Then, when you implement the design, you can denormalize the design back to 2NF for production efficiencies or for programming convenience. When you denormalize the implementation of a 3NF data model, you know where to expect the synchronization problems caused by insert, update, and delete anomalies that the denormalizing operation introduces. And you can more easily create the compensatory programs that keep the redundant data in check and synchronized. For more information about data normalization, see "Why You Need Database Normalization," March 1999, InstantDoc ID 4887.
Avoidable Pitfalls
All new data modelers, including me, committed one or another of these seven deadly data-modeling sins when they were learning the skill. If you avoid making these mistakes, you'll discover that your projects go faster and more smoothly. You'll also find that your database designs will endure for years, sometimes outlasting the company for which they were created and providing good service to the querying public.