• subscribe
May 22, 2002 12:00 AM

Tune In to Data Modification

SQL Server Pro
InstantDoc ID #24700
Downloads
24700.zip

Make sure you have the right indexes for your needs

You know by now that you have several options when tuning indexes for SELECT queries, but the pertinent factors to consider are different when you're tuning indexes for data modification. When determining the best indexes to use for SELECT queries, the only reason not to create more indexes than you might need is that the extra indexes take up disk space. Having more indexes than you need doesn't degrade performance unless your tables are being modified regularly.

Every time you modify a table's data, you affect the indexes on that table because SQL Server must update the indexes. SQL Server performs this index maintenance automatically, but you need to be aware of the cost. Data modification mostly affects nonclustered indexes because the leaf level of every nonclustered index contains a pointer to every data row. So for every row you insert into a table, a nonclustered index leaf level gets a new pointer; for every row you delete, the leaf level loses a pointer. Remember that a nonclustered index's leaf level keeps the index pointers sorted, so the new row must be inserted in the proper location.

Adding to the complication, these changes can propagate upward in the index. If the new index row fills an index page, the page must split into two pages. Then the level above, which contains pointers to every page, gets a new page pointer. If an index row is removed and is the only row on the page, SQL Server has to remove the leaf-level index page and adjust the index level above.

Pointers About Indexes
I won't cover here all the details about how SQL Server maintains indexes internally. For the purposes of query tuning, the additional details are irrelevant. But keep in mind that SQL Server must make these index adjustments for every nonclustered index, and you can have up to 249 nonclustered indexes on one table. For example, say you have a contacts table with nonclustered indexes on last name, first name, phone number, and city. Every time you insert a new row into the table, SQL Server must add pointers for the new last name, first name, phone number, and city in the right locations in the indexes.

For a clustered index, fewer changes are necessary because the index pages contain pointers only to pages instead of to individual records. SQL Server adds a new index row only when it has to add a whole new page to a table. When someone inserts a new row into a table that has a clustered index, SQL Server must put the row into the proper sorted location, like when you need to add a new pointer to a nonclustered index's leaf level. If the data page has no room for the new row, a page split occurs. (I talk more about page splits later.) When a row is removed, if it's the last one remaining on the page, SQL Server has to adjust the index rows accordingly. So clustered indexes have less overhead, but you can have only one per table. Any additional indexes must be nonclustered.

It might seem that you should completely eliminate indexes if you're going to be modifying data, but you need to consider another aspect of data modification. In particular, for many UPDATE and DELETE operations, you want to modify only a small set of rows, so the UPDATE or DELETE query will contain a WHERE clause, perhaps including search arguments. Having indexes on one or more search arguments in your data-modification queries lets SQL Server find the relevant rows more quickly.

You need to figure out how to build the indexes that SQL Server needs to locate the data without building so many indexes that they'll significantly affect the data-modification operation's performance. Let's look at an example in the Northwind database. First, we'll create a copy of the Order Details table and add multiple indexes. After each new index, we'll delete one row of data. The Order Details table has only nine pages, but even with this small table, you should be able to see the effect indexes can have. I'll leave it to your imagination, or your testing, to determine what the effect would be on tables that are hundreds or thousands of times as large as Order Details.

Here's the code to copy the table:

USE Northwind
SELECT * INTO od
FROM [order details]

Now, delete one row to determine baseline statistics before building any indexes; turn on STATISTICS IO to determine how many page accesses are required to find and delete the row:

SET STATISTICS IO ON
GO
DELETE od
WHERE OrderId = 11077 AND ProductId = 66
GO
SET STATISTICS IO OFF
GO

The results show that SQL Server performed 11 page reads. Now, use the following code to build a clustered index on the search argument columns and delete another row:

CREATE UNIQUE CLUSTERED INDEX OD_key ON od(OrderId, ProductID)
GO
SET STATISTICS IO ON
GO
DELETE od
WHERE OrderId = 11077 AND ProductId = 75
GO
SET STATISTICS IO OFF
GO

Because the index is useful in finding the right row to delete, the number of page reads decreases by more than 80 percent to only two pages.

The code in Listing 1 builds additional indexes, deleting a row after building each index to demonstrate how modifications to tables with indexes affect the amount of I/O required. Running this script shows that each index adds more work for SQL Server. By the time you build the fourth nonclustered index, the number of page reads is greater than the number required when no indexes existed. Keep in mind that reading pages isn't the only work that SQL Server is doing. If SQL Server needs to split pages, it must do a lot of work to move rows from one page to another, and that work doesn't show up in the STATISTICS IO values.



ARTICLE TOOLS

Comments
  • Anonymous User
    8 years ago
    Dec 06, 2004

    GOOD

You must log on before posting a comment.

Are you a new visitor? Register Here