Learn how to balance data modification against retrieval
The indexes that work best for data retrieval aren't always the best ones for data modification. To know which indexes are the best for data-modification operations, you need to know how SQL Server uses indexes during those operations. Last month, I told you how SQL Server uses indexes and how INSERT operations affect those indexes, and I gave some basic recommendations for indexing tables with heavy insert activity. Now let's look at the indexing effects and recommendations for the two other data-modification operations, DELETE and UPDATE.
One of my main points last time was that although indexes can help SQL Server find and retrieve data, they can degrade SQL Server's performance of data-modification operations because SQL Server has to maintain the indexes so that they reflect the new data. Because every nonclustered index has a leaf-level index row for every row in the table, each time a new row is inserted into the table, SQL Server must insert a new index row in every nonclustered index. And each time a row is deleted from the table, SQL Server must remove a leaf-level row from every nonclustered index.
Using DELETE and UPDATE can be a two-edged sword. Frequently, you have to perform DELETE and UPDATE operations on only one or a few rows that meet some search condition. So, how do you find the rows you need to modify? Indexes can help SQL Server find the rows of interest, but as I noted earlier, they add overhead during the modification. The trick is to have on your tables only the indexes that help SQL Server find the rows to modify but no extra indexes that add overhead to your data-modification operations.
Determining which indexes can be useful for finding the desired rows is just like indexing for SELECT operations, which I've discussed in several previous columns, so I won't cover that again. Instead, let's look at what happens during the DELETE and UPDATE operations.
DELETE
When you delete rows from a table, you need to consider what happens to the data pages and the index pages. Remember that the data is the clustered index's leaf level, and the deletion of rows from a table that has a clustered index happens in the same way as deletion from the leaf level of a nonclustered index. However, SQL Server manages the deletion of rows from a heap (a table with no clustered index) and from an index's non-leaf pages in a different way.
First, note that SQL Server 2000 and 7.0 don't automatically compress space on a page when a row is deleted from a heap. Compaction doesn't occur until a page needs additional contiguous space for inserting a new row.
Second, in the index's leaf level, when rows are deleted, SQL Server marks them as ghost records. The ghost row stays on the page, but a bit in the row header changes to indicate the row's ghost status, and search operations will never access the row. (The purpose of maintaining the deleted row as a ghost record is mainly to allow greater concurrency optimization during key-range locking, but the details are beyond the scope of this article.) Ghosting happens at the leaf level of both the clustered index (the data) and every nonclustered index on the table. Because SQL Server deletes a row from all nonclustered indexes whenever you delete a row from a table, one deleted row can result in many ghost records. SQL Server doesn't "ghost" rows in non-leaf index pages when they're deleted, but as with heap pages, SQL Server doesn't compress the space until new index rows need space on the page.
When the last row is deleted from a data page, SQL Server deallocates the entire page. (Unless the page is the only one remaining in the table—a table always contains at least one page, even if it's empty.) This deallocation also results in the deletion from the index page of the row that pointed to the old data page. If the deletion of an index row (which might occur as part of a delete-plus-insert update strategy) leaves only one entry in the index page, SQL Server moves that entry to a neighboring page and deallocates the empty page.
UPDATE
SQL Server can update rows in multiple ways, automatically and invisibly choosing the fastest update strategy for the specific operation. In determining the strategy, SQL Server evaluates the number of rows affected, how it will access the rows (by a scan or an index retrieval, and through which index), and whether changes to the index keys will occur. Updates can happen either in place or as a delete followed by an insert.
What happens if an update causes a data row to move to a new location? In SQL Server 2000 and 7.0, a move can happen when an update to a row with variable-length columns makes the row large enough that it no longer fits on its original page. Or, because SQL Server stores rows in order of the clustering key, it can happen when a clustered index column value changes. For example, if you have a clustered index on the lastname column, SQL Server will store a row with a lastname value of Abbot near the beginning of the table. If you then update the lastname value to Zappa, SQL Server has to move the row to near the end of the table. When a row moves because of a change in the clustered-index key value, SQL Server must update every nonclustered index on the table to reflect the new value. When you're deciding which columns to build your clustered index on, remember that the nonclustered index pointers use the clustered-index key as a bookmark for locating the data row. So, it's best to cluster on a nonvolatile column.