Integrity Is Integral
The next step in configuring the database for performance is to let the SQL Server enforce referential integrity. For example, the Store and Sale tables in Listing 2, which are adapted from the pubs database, have a dependent relationship. A Sale can't happen without being associated with a Store. Referential integrity means you enforce this business relationship in one of two ways. You can assign the enforcement task to an application outside the SQL Server, or you can let the SQL Server take care of enforcing the rule. In my opinion, enforcing static rules such as the referential integrity between Store and Sale is best left to the database. You code the rule one time, as the code at callout B in Listing 2 shows. Then, SQL Server enforces the rule for all the database's users. If you try to enforce a rule with an application, the rule might be left out of subsequent versions of the same application, or the rule might be overlooked by other applications that access the same data, leading to violations of referential integrity and possibly corrupted data.
In addition to enforcing referential integrity within SQL Server, I suggest that you use declarative referential integrity (DRI) rather than triggers or stored procedures. DRI is a constraint, and constraints execute more efficiently than triggers or stored procedures, especially if you'll be using bulk-loading operations to transfer data into the database.
Listing 2 shows how to establish DRI between the Store and Sale tables. As I mentioned earlier, these two tables are adapted from the pubs database. I've modified columns and included parameters that I feel are necessary in a good production environment, plus I've added some columns to better illustrate data-model performance-tuning concepts. The CREATE TABLE statement for the dbo.Store table adds a new column, StorePhoto, which is an image data type. Notice that in the last line of this CREATE TABLE statement I've directed the data to be stored in the MyDatabase_data filegroup, but the associated store images will be stored in the MyDatabase_image filegroup.
Foreign keys. I always create primary key and foreign key constraints separately from the CREATE TABLE statement because I want to be in control of the constraint names. You can designate a column as a primary key or a foreign key when you create the table, but I don't do that. I prefer separate ALTER TABLE statements for the primary and foreign key constraints. When you're designing and developing a data model, change is your constant companion. Therefore, you must be able to quickly and readily identify and reference the various constraints.
The default constraint names that SQL Server assigns aren't too difficult to interpret. For instance, FK__sales__stor_id__0AD2A005, the name of the original foreign key constraint between Stores and Sales in my copy of the pubs database, is obviously a foreign key in the Sales table that uses the stor_id column. However, the name of the constraint doesn't tell me what table it's referencing. My naming convention, FK_Sale2Store, is more succinct, and although my name doesn't contain column information, it clearly tells me that Sale is dependent on Store. In addition, I don't have to execute a query to get the full constraint name. I need only to follow the standard that I've established for foreign key constraint names. Any foreign key constraint will start with FK_, followed by the name of the dependent table, the number 2, and lastly the name of the independent table.
Indexing. SQL Server doesn't automatically create an index for foreign key columns (as it does for the primary key of a table). In production, you can count on using the primary and foreign key columns for join operations. Therefore, when you create a new database, you should create an index for each foreign key column in any dependent table. Typically, these indexes are nonclustered. I'll talk more about clustering in future articles, but suffice it to say that you cluster on the column or columns that will be used most for data retrieval. You might even wait until your database is near or in production before you make decisions regarding clustering.
For an example of this clustering philosophy, take a look at the clustered index at callout C in Listing 2, which I built on the Sale table using StoreCode plus SaleID. Sales reports will run daily or hourly and will be sorted first by store (represented by StoreCode), then by SaleID. The SaleID values increase throughout the day (SaleID is an identity), so they are a defacto entry sequence number; both they and the SaleDate values increase with each new sale. Most of these reports will be summary reports with totals calculated by store. StoreCode plus SaleID is an excellent candidate for clustering. The clustered index will speed up the data returned because the data is already organized and orderedin this case, ordered first by the Store Code and then, within each value of Store Code, by SaleID. StoreCode by itself joins the Sale table back to the Store table. Having two indexesone clustered, one nonclustered, both starting with StoreCodemight be a redundant design, but once the database is in production, you'll have a chance to see how the SQL Server will use (or not use) these indexes. If the SQL Server doesn't use the nonclustered index, you can safely drop it. During the design phase, however, I suggest you create a separate index for each foreign key column, even if it means some initial indexing redundancy.
If possible, define default and check constraints in the database rather than at the application level for the same reasons I suggested you let the database manage referential integrity. Because constraints are database objects, they execute quickly and efficiently, more so than code stored in external application logic. If the rules that can be expressed by default and check constraints are staticthat is, they're not changing dynamically or every week or even every monthyou can define these rules as column or table constraints. In Listing 2's CREATE TABLE dbo.Sale statement, the column SaleDate has a default value of CURRENT_TIMESTAMP. This rule probably won't change for the life of the database, so it's a good candidate for defining as a column constraint. The next rule, SalePayTerms, is expressed as a check constraint. SalePayTerms can't be NULL, and initially, the values defined for SalePayTerms are Net 30, Net 60, and On Invoice. An application can't override this set of values. If you need to add a value to the set, you'll need to modify this column constraint. If you suspect that this set of values will be frequently modified, a more appropriate way to implement the constraint would be to create a lookup table of sale payment terms, give each row in the table a unique value as an identifier, and relate the lookup table to dbo.Sale, in a one-to-many (1:M) relationship. The relationship between the lookup table and dbo.Sale would be a dependent relationship, enforced in much the same way as the relationship between dbo.Store and dbo.Sale.
Database performance tuning is a huge, vital topic requiring understanding of the database environment and a lot of hands-on experience to fully comprehend how to optimize your database. A good place to start with performance tuning is at the beginning, by understanding your data model and the business rules and requirements. Only then can you maximize the data model for high performance.