DOWNLOAD THE CODE:
Download the Code 93189.zip

Leveraging the Data Model
You can't ignore the effect a data model has on maintaining valid data. Poor modeling techniques and the failure to apply best practices to a model's design are the root cause of most validation and integrity problems. In her SQL by Design column, Michelle Poolet explores many concepts that are central to leveraging the data model to maintain data integrity and validity. In "The Four Integrities" (November 1999, http://www.sqlmag.com, InstantDoc ID 6200), Poolet explains how to maintain entity, referential, domain, and business integrity.

Maintaining entity integrity means validating the data within a row. Primary keys and unique constraints let developers uniquely identify each row in a table, thereby eliminating duplicates.Take a look at Figure 1, which shows the data model for three tables: Account_Type, Customer, and Account. The complete DDL code that creates these tables is available in Web Listing 1. The code snippet in Listing 1 shows that the entity Account_Type contains a primary key, account_type_id, which I implemented by using a primary key constraint. I also defined an alternate key on the description column by using a unique constraint.This design ensures unique rows in two ways: Rows are differentiated by account_type_id and by description. This implementation-provides the advantage of ensuring that account types are unique by name and it improves performance and minimizes storage by using the account_type_id to enforce any potential foreign key relationships.

Another example of entity integrity involves using a table-level check constraint to validate or compare multiple columns in a row. For example, at callout C in Listing 2, the account_balance_ck constraint in the Account table ensures that for a row to be valid, the sum of opening_balance plus current_balance must be greater than or equal to zero.

Foreign key constraints aid developers in enforcing referential integrity.As you can see at callout D in Listing 2, a foreign key constraint enforces referential integrity on the Account.account_type_id column. This constraint ensures that all values in the child column have a single corresponding value in the parent column.

Several options exist for enforcing domain integrity. Column-level check constraints enforce domain integrity by limiting a column's value to a certain range. For example, the account_opening_balance_ck constraint on the Account table at callout B enforces that the opening_balance is always greater than or equal to zero. Likewise, combining a User Defined Type (UDT) with check constraints limits any column defined by the UDT to the given domain of values. In addition, you can use foreign key constraints to enforce domain integrity by limiting a column to the defined set of values in the parent table.

Choosing between a check constraint and a foreign key constraint to enforce domain integrity can be tough. Although both move you closer to smart data validation, both impose performance and maintenance overhead related to their implementations. Carefully weigh which might be best for your particular situation. As a general rule of thumb, use a check constraint when the number of values is small and static, and use foreign key constraints through referential integrity when the number of values is larger and the values tend to be dynamic.

Within the realm of smart data validation, triggers can offer a means of enforcing business integrity (the idea of enforcing complex rules across rows and tables) and any of the other three integrity types already discussed. Although poorly written triggers are often at the heart of many performance problems, triggers can be a good tool for validating data. Just use them with care.

Normalization
In considering the data model's effects on smart data validation, we can't overlook the role of normalization—or more accurately, the affect of an incorrect normalization level on maintaining data integrity. For insight into this idea, we again look to Michelle Poolet for advice. Poolet's article "WhyYou Need Data Normalization" (March 1999, InstantDoc ID 4887) explains four goals of normalization, two of which relate directly to our concept of smart data validation.The concepts Poolet covers include "... minimizing the amount of duplicate data stored in a database" and "organizing the data such that, when you modify it, you make the change only in one place."

Similar to the importance of enforcing data integrity through modeling, correctly normalizing data also has a dramatic effect on data quality. For example, data duplication increases the cost of maintenance and the corresponding risk of introducing inconsistencies. Not all data models must be in the third or fifth normal forms, but as with the other concepts in data validation, you need to carefully consider the affect of a particular normalization level on maintaining data integrity and performance.

Nulls and Data Types
A data model assists in ensuring smart data validation in two additional ways. First, a data model can help you decide whether an attribute is required. If the attribute is required, then you can force the schema to reject NULL values by setting the attribute to NOT NULL. Likewise, you can improve data validation by using constrained data types, which force data to be within a set range of values.For example,instead of using an integer to represent a two or three-state value (e.g., "null", "yes", "no") consider using a bit because by definition, a bit will allow only up to three possible values: 1, 0, and NULL. As another example, given that SQL Server represents all date values as date + time by means of the datetime data type, in situations in which you need to guarantee users are accessing "date-only" values, you can expose a standardized date, rounded to 12:00 A.M., by using computed columns instead of requiring users to remember that each query referencing the date value must manually account for the time portion as well. The opening_date and opening_datetime attributes in the Account table illustrate this technique, as the code at callout A in Listing 2 shows. Using calculated columns lets you capture exactly when an account was opened (including the time of day) while exposing a constrained version of the same value, opening_date, that lets users write simple queries without having to worry about inadvertently excluding accounts that don't happen to have been opened at exact day boundaries.

Sometimes, even with valid data, you can use small data model changes to increase reporting consistency and accuracy. This strategy focuses on incorporating a "single version of the truth" into our data models by moving complex, frequently used calculations out of our report's code and into a persisted state within the data model. Leveraging a "single version of truth" is most often associated with business intelligence solutions such as data warehouses and data marts. However, the technique is also applicable in transactional systems that use calculations such as the derivation of age from a date of birth or a more complex calculation such as the quarter-to-date average net assets within a financial system. Applying this technique often comes at a high price because of the increased performance cost of maintaining the calculated value. However, avoiding the frustration that occurs when multiple people arrive at different results for a calculation involving the same set of values often far outweighs any added performance affect.

Prev. page     1 [2] 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE