Test to find the best index-creation order
Modifying only one row through one INSERT, UPDATE, or DELETE statement is fairly straightforward. However, if the table has multiple indexes and you're modifying multiple rows through one statement or you're using bulk copy program (bcp) or the BULK INSERT command, you have to make some tough decisions. The most important decision is whether to drop one or more indexes before performing the modification or to leave all the indexes on the table. To help you make this decision, let's look at what goes on when SQL Server manipulates multiple rows by insertion or update. Then, I'll show you some example tests you can run to find the best method of using your indexes to support those transactions.
Operations that modify all or most of the rows in a table or that add a large number of rows to a table are called batch operations. Although you can delete multiple rows at the same time, the details aren't as complex as with inserts or updates because removing a value from a table or index is much easier than inserting a new value. Remember that insertions can lead to page splits or forwarding pointers. SQL Server often implements an update as a DELETE followed by an INSERT, so for the INSERT phase of the update, split pages or forwarding pointers might result. Remember also that for each new value in a table, whether the new value arrived from an INSERT or an UPDATE statement, every nonclustered index must have a new row inserted that references the new value. So if you're inserting or updating thousands or millions of rows in a table and the table has multiple nonclustered indexes, each needing to be modified for each new row, SQL Server must do a huge amount of maintenance.
Strategies for Batch Operations
SQL Server 2000 and 7.0 offer two strategiestable-level modification and index-level modificationfor maintaining all the indexes that belong to a table during a batch operation. The query optimizer chooses between them based on its estimate of the likely execution costs for each strategy.
In table-level modificationsometimes called row-at-a-timeSQL Server maintains all indexes for each row as that row is modified. In most cases, SQL Server has to perform many random index accesses; at least one page per index per updated row needs modification. Also, you might have additional modifications at the upper (non-leaf) index level. For INSERT operations, if the rows being inserted are already in a sorted sequence, SQL Server can more efficiently modify any index that's sorted in the same sequence. However, because the input rows can be sorted in only one order, other indexes still require a lot of random access. SQL Server can use nonrandom index accesses for at most one index.
In index-level modificationssometimes called index-at-a-timeSQL Server uses a work table to store all the rows to be modified and within this table sorts each index in order of the index keys. In other words, you have as many sort operations as indexes. Then, for each index, SQL Server merges the inserts or updates into the index and never accesses each index page more than once, even if multiple modified rows pertain to one index leaf page.
If the update involves a tiny percentage of the table's rows and the table and its indexes are large, the query optimizer usually considers table-level modification the best choice. Most online transaction processing (OLTP) operations, which affect only one or a few rows, use table-level modification. Conversely, if the update is relatively large, table-level modifications require a lot of random I/O operations and might read and write each leaf page in each index multiple times. In that case, index-level modification offers much better performance. The amount of logging required is the same for both strategies.
Pre-7.0 releases of SQL Server recommend dropping all indexes before using a bulk copy operation to import a lot of rows into a table because these releases offer no index-level strategy for maintaining all the indexes. Inserting large numbers of rows when the table has no indexes for SQL Server to maintain and then building (or rebuilding) the indexes after loading the data is always more efficient. However, if you're using SQL Server 2000 and 7.0, don't assume that you should drop all indexes before loading large amounts of data. When testing your applications' performance and index strategies, make sure to test your batch operations before and after building indexes. In cases where SQL Server can effectively perform index-at-a-time batch operations, you can often substantially improve performance by building the indexes before loading or updating the data.
Testing Batch Operations
To demonstrate how you might organize index performance tests for your bulk operations, I've included two scripts. Listing 1 creates a copy of the Northwind database's Order Details table. By changing the number of times the script copies the table into itself, you can control the number of rows in the table. Listing 1 creates a table with more than half a million rows. Listing 2 uses the T-SQL command BULK INSERT to perform three bulk load tests, each of which can be performed in two different ways. The script first copies the od table (which the script in Listing 1 created) to a text file. Each test creates a new table called load_od to hold the inserted rows. The first test loads the data, then builds one clustered index and four nonclustered indexes on load_od. The second test builds the clustered index first, loads the data, then builds four nonclustered indexes. The third test builds all five indexes before loading the data.
Prev. page  
[1]
2
next page