Several readers have asked me to list the typical pitfalls that beginning database designers encounter when they undertake a conceptual model for the first time. The conceptual model is so named because the entity-relationship model that you create is a concept of what the database will look like when you implement it. Even though novice database designers can avoid some pitfalls by reading about them, they're bound to make mistakes. Let's look at some of the most serious errors.
1. Misunderstanding Requirements
Before you begin to model a database, you need to understand the business and system requirements. Data modelers must interview people within the organization who understand what kind of data needs to be captured and how, where, and when it will be used. Without this information, data modelers can't create a plan for a database that will work for their organization. When modelers encounter conflicting requirements (e.g., to one department, a customer is a person who buys goods, but to another, a customer uses services), they must resolve those conflicts so that the project can proceed. Then, modelers have to match system specifications to the requirements. For more information about gathering business and system requirements, see "Data Modeling," April 2000, InstantDoc ID 8241.
2. Using Inconsistent Naming Conventions
Using inconsistent naming conventions (or using no naming convention at all) for database objects can lead to confusion and errors in data retrieval. Databases evolve over time, and the names of the database objects (tables, columns, keys, indexes) evolve with the database. Too often, naming inconsistencies become so prevalent that they cause confusion when users attempt to access data, and they eventually inhibit productivity. Here are some of the rules for creating useful names for database objects:
- Use descriptive names that are meaningful to the entire organization; don't incorporate jargon that's familiar to only a small section of the company.
- Use a name that clearly identifies the database object. Consider using a variation of the notation scheme that Microsoft introduced in Access 1.1, in which you prefix objects with three-letter identifiers (e.g., you designate the Employee table as tblEmployee).
- Use the minimum number of words necessary to convey the meaning of the database object. In SQL Server, you're limited to 32 bytes for any object namewhich is still too long to repeat in your SQL Server queries and stored procedures.
- Don't confuse the meaning of the name by adding redundant terms to it (e.g., tblRedundantTable).
- Don't use acronyms, and use abbreviations judiciously. You can't count on the availability of a business metadata repository to document and decode the meaning of acronyms and abbreviations.
- Don't use names that implicitly or explicitly identify more than one topic (for tables) or characteristic (for columns).
- Use the singular form of a name wherever possible, especially for entities and tables; it helps you correctly distinguish the entity relationships (1:1, 1:M, M:N).
- Don't include spaces in database object names (such as Employee ID). No other database management system (DBMS) supports these spaces, and you might have to integrate another DBMS with your SQL Server environment someday.
As you choose naming conventions, keep in mind that no convention is perfect and that any naming convention is better than no convention. Pick one convention and use it.
3. Duplicating Attributes
Creating the same attribute in more than one entity is a temptation that many novice data modelers can't resist. Neophytes are often slow to accept data normalization or to understand that SQL joins can successfully reunite data fields that need to be included together on a report or form. The only attributes you have to duplicate are primary and foreign keys.
4. Using a Printed Report as a Table Template
Novices often try to use a printed report as a template for a table by transferring the report's fields into a table or entity. Perhaps the most obvious example is the invoice. I've seen entities and tables named Invoice containing many different arrangements of attributes that belong in three separate tablesSale, Product, and SaleItem. A printed report is both a business requirement for and one of the outcomes of your database design. To produce the report, you need to store the attributes that compose the report in the database. A printed report is rarely a template for a table layout.