Your results will look something like this:
DBCC results for 'bigrows'.
There are 5 rows in 2 pages for object 'bigrows'.
Forwarded Record count = 1
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Updating: In Place or Not
Updating a row in place is the rule rather than the exception in SQL Server 7.0, unlike with earlier releases. Updating in place means that the row stays in the same location on the same page, and only the affected bytes change. In most cases of updating in place, the log will contain one record for each such updated row. The exceptions occur when the table has an update trigger on it or when the table is marked for replication. In these cases, the update still happens in place, but the log contains a delete record followed by an insert record. Updates happen in place if you update a heap or if you update a table with a clustered index without changing any of the clustering keys.
In cases where a row can't be updated in place, the cost of a not-in-place update is minimal because of the way SQL Server stores nonclustered indexes and because of the use of forwarding pointers. No nonclustered indexes need changing, which makes the update quite inexpensive.
If your update can't happen in place because you're updating clustering keys, it will occur as a delete followed by an insert. Sometimes when you're updating a variable-length column in many rows, you get a hybrid update: some of the rows are updated in place and some aren't. If you're updating index keys, SQL Server builds a list of rows that need to change as a delete plus an insert operation. If the list is small enough, SQL Server stores it in memory; if the list would take too much memory, SQL Server writes it to tempdb. SQL Server then sorts this list by key value and operator (delete or insert). If the index whose keys are changing isn't unique, SQL Server then applies the delete and insert steps to the table. If the index is unique, it collapses the delete and insert operations on the same key into a single update operation. Let's look at a simple example.
The code in Listing 2 builds a table with a unique clustered index on column X, then updates that column in both rows. Table 1 shows the operations that SQL Server generates internally for that update statement. This list of operations, the input stream, consists of the old and new values of every column and an identifier for each row to be changed. In this case, the updates must be split into delete and insert operations. If they weren't, the update to the first row, changing X from 1 to 2, would fail with a duplicate-key violation. So SQL Server generates a converted input stream that looks like Table 2. Note that for the insert half of the update, the RID isn't relevant in the input stream of operations to be performed. Before a row is inserted, it has no RID. SQL Server then sorts this input stream by index key and operation, as in Table 3.
Finally, if one key value has both a delete and an insert, the two rows in the input stream collapse into an update operation. So the final input stream looks like Table 4. You can see that, although the original query was an update to column X, after the split, sort, and collapse of the input stream, the final set of actions looks as if you're updating column Y! This method of carrying out the update prevents intermediate violations of the index's unique key. Screen 1 contains part of the graphical query plan for this update, showing the split, sort, and collapse phases.
Updates to nonclustered index keys also affect the index's leaf level by splitting the operation into deletes and inserts. Think of the nonclustered index's leaf level as a miniature clustered index; any modification of the key could potentially affect the sequence of values in the leaf level. As with the data in a clustered index, the index's uniqueness determines the update type. If the nonclustered index is nonunique, the update splits into delete and insert operators. If the nonclustered index is unique, the split is followed by a sort and an attempt to collapse any deletes and inserts on the same key back into an update operation.
Table-Level vs. Index-Level Data Modification
We've just considered the placement and index manipulation necessary for modifying one or a few rows with no more than one index. If you're modifying multiple rows in a single operation (insert, update, or delete) or by using bulk copy program (bcp) or the BULK INSERT command and the table has multiple indexes, you need to be aware of some other facts. SQL Server 7.0 offers two strategies, table-level and index-level modification, for maintaining all of a table's indexes. The query optimizer chooses between them based on its estimate of the anticipated execution costs for each strategy. In a table-level modification, sometimes called row-at-a-time, SQL Server maintains all indexes for each row as it modifies that row. If the update stream isn't sorted, SQL Server needs to perform many random index accesses, one access per index per update row. Even if the rows that will be modified are supplied as an ordered stream, the update stream can't be sorted in more than one order; therefore, there might be nonrandom index accesses for at most one index.
In index-level modifications, sometimes called index-at-a-time, SQL Server gathers all the rows to be modified and sorts them for each index. In other words, as many sort operations as indexes occur. Then, for each index, SQL Server merges the updates into the index. SQL Server will access no index page more than once, even if multiple updates pertain to one index leaf page.
Usually, if the update is small and the table and its indexes are large, the query optimizer considers table-level modification the best choice. Most online transaction processing (OLTP) operations use table-level modifications. But if the update is relatively large, table-level modifications require many random I/O operations and might read and write each leaf page in each index multiple times. In such cases, index-level modification performs much better. The amount of logging required is the same for both strategies.
Let's look at a specific example. Although this example deals with inserting new rows, you could perform a similar analysis if you were updating a large percentage of the rows in a table. Suppose you use BULK INSERT to increase the size of a table by 1 percent (which could correspond to 1 week in a 2-year sales history table). The table is stored in a heap (no clustered index), and the rows are simply appended to the end because the table's other pages don't have much available space. There's nothing to sort on for insertion into the heap. Assume also that you have two nonclustered indexes on columns a and b. SQL Server sorts the insert stream on column a and merges it into the index on a.
If an index entry is 20 bytes long, an 8KB page filled at 70 percent (which is the natural, self-stabilizing value in a B-tree after many random insertions and deletions) would contain 8KB * 70 percent ÷ 20 bytes = 280 entries. Eight pages (one extent) would then contain 2240 entries, presuming leaf pages are laid out in extents. A 1 percent table growth implies, on average, 2.8 new entries per page, or 22.4 new entries per extent.
Table-level insertion would touch each page, on average, 2.8 times. Unless the buffer pool is large, table-level insertion reads, updates, and writes each page 2.8 times, which means 5.6 I/O operations per page, or 44.8 I/O operations per extent. Index-level reads, updates, and writes each page (and extent, again assuming a contiguous layout) exactly once. In the best case, about 45 page I/O operations are replaced by two extent I/O operations for each extent. Of course, the cost of using multiple sorts to do the insert also includes the cost of sorting the inputs for each index. But the much-reduced cost of the index-level strategy can easily outweigh the cost of sorting.
Releases of SQL Server before 7.0 recommend dropping all indexes if you're going to use bcp to import many rows into a table or if you're going to do mass updates. This recommendation is good for earlier releases because they have no index-level strategy for maintaining all the indexes. With the new index-at-a-time strategy in SQL Server 7.0, this recommendation is no longer valid.
You can find out whether SQL Server made your updates at the table level or the index level by inspecting the SHOWPLAN output. If SQL Server performs the update at the index level, you'll see a plan that contains an update operator for each of the affected indexes. If SQL Server performs an update at the table level, you'll see only one update operator in the plan.
Recommendations
Updates in earlier releases of SQL Server can be problematic. They can happen quickly if certain conditions are met, but in other cases, when you have deferred operations, updates take many times longer. In addition, more than a few nonclustered indexes on a table can greatly slow the update speed. In SQL Server 7.0, all updates can happen quickly. The new indexing structures significantly reduce the need for index maintenance during updates. And the new index-at-a-time update strategy greatly optimizes the performance of index maintenance, in cases where it's still necessary. Indexes in SQL Server are a great thing; don't let concerns about maintenance during updates keep you from adding all the indexes you need to your important tables.
(This article was adapted from Inside SQL Server 7.0, Chapter 8, "Modifying Data.")
End of Article
Prev. page
1
[2]
next page -->