In the year since SQL Server 2005's release, the number of technical sites containing
SQL Server information has burgeoned. Microsoft's SQL Server development
team members host blogs describing little-known details of SQL Server operations and
best practices. But even with all this information available, some aspects of SQL Server
have changed subtly—changes that you might have missed. You might not realize that
what you think you know about SQL Server behavior
is actually no longer true. I'll give you the straight story
about three such misconceptions.
Choosing to Be a Victim—or Not
One SQL Server 2005 change that surprised me was an enhancement to the deadlock-resolution algorithm. In SQL Server 2000 and earlier, the SET option SET
DEADLOCK_PRIORITY had two values, LOW and NORMAL. LOW didn't
mean that you had lower priority for being selected the victim; it only controlled the
connection in which the value was set. Setting this option to LOW indicated that if
this connection were involved in a deadlock, it should be chosen the victim. There
was no way to set your priority to HIGH, to indicate that this connection shouldn't
be chosen as the victim.
I'd already written and delivered my updated SQL Server Internals course, which
included a description of SET DEADLOCK_PRIORITY and stated that only two
possible values could be used. By chance, I came across a reference to setting DEADLOCK_PRIORITY to HIGH and at first assumed it was a mistake. But when I checked
SQL Server 2005 Books Online (BOL), I discovered a change I hadn't known about.
In SQL Server 2005, SET DEADLOCK_PRIORITY lets a process determine its
priority for being chosen as the victim by using one of 21 different priority levels, from
-10 to 10. You can still use the value LOW (equivalent to -5) or NORMAL (which is
0 and the default); you can also specify HIGH (which is 5).
Which session is chosen as the deadlock victim depends on each session's deadlock
priority. If the sessions have different priorities, SQL Server chooses the lowest-priority
session as the deadlock victim. If both sessions have the same deadlock priority, SQL
Server selects as the victim the session that's less expensive to roll back.
Unnecessary Database Backups
Another old "fact" that's no longer true, though still commonly believed, concerns
database backups. A longtime SQL Server user posted a question on a public newsgroup
about a behavior change he'd noticed in SQL Server 2005. In SQL Server 2000, if he truncated a table, he then had to take
a full database backup for his log-backup
chain to be unbroken. He claimed that
if he tried to take a log backup after the
TRUNCATE, SQL Server complained.
I don't know exactly why he generated
an error, but there's no reason for SQL
Server 2000 TRUNCATE operations to
invalidate log backups. This was true in a
much older version of SQL Server, which
is probably where he first encountered this
behavior; he just kept believing it until he
tried it in SQL Server 2005.
In SQL Server 2005, 2000, and 7.0, a
TRUNCATE is a logged operation. Log
backups containing transactions involving
a TRUNCATE operation are fully functional and can be used to restore the
database as easily as log backups not containing transactions that perform truncations. Because of his outdated belief, the
user probably spent more time than necessary performing full database backups.
I imagine the person who posted this
question wasn't the only one using this
outdated information.
Maintaining Indexes During
Bulk Modifications
Here's another behavior change you
might not be aware of. Before SQL
Server 7.0, it was usually considered a best
practice to drop all your indexes before
doing any kind of bulk load operation.
Bulk loads include the BCP command,
the T-SQL BULK INSERT command,
and simple load operators in SQL Server
2000's DTS or SQL Server 2005's SQL
Server Integration Services (SSIS, DTS's
replacement). Before SQL Server 2000,
it was usually considered faster to drop
all indexes, load the new data into an
unindexed table, then rebuild the indexes.
Although rebuilding indexes is expensive, this process was less time-consuming—if a
large percentage of new rows were being
added—than trying to add those rows while
maintaining indexes.
SQL Server 2000 introduced an internal
technique for processing bulk inserts—index-at-a-time modifications—that makes
inserting data much more efficient, even
while maintaining multiple indexes. This
technique is applicable to large update
operations as well as bulk inserts.
The alternative to index-at-a-time modifications is row-at-a-time modifications,
which was the only possibility before SQL
Server 2000. For every new or updated row,
SQL Server maintains each index individually, inserting or updating a pointer at the
appropriate location in each nonclustered
index. As an example, let's use the Sales.Sales
OrderHeader table from the AdventureWorks database. I'll make a copy of that
table and build the clustered index on
SalesOrderID and nonclustered indexes on
CustomerID, SalesPersonID, and ContactID.
If I insert 100,000 new rows, the values of
CustomerID, SalesPersonID, and ContactID
will be relatively randomly spread across the
index leaf levels. For each new row, SQL
Server would have to find where in the
CustomerID index to insert or update the
new CustomerID value, find where in the
SalesPersonID index to insert or update
the new SalesPersonID value, and in the
ContactID index, insert or update the new
ContactID value. This process could result in
300,000 random-access data modifications,
excluding any modifications at the indexes'
upper levels as more rows are added to leaf.
Index-at-a-time modifications are possible because SQL Server sorts data efficiently. If sufficient rows are being inserted
or updated, SQL Server can perform
repeated sort operations on the data. The
data can be sorted using the appropriate
order for each index; then only a single pass
through that index's leaf level is needed to
merge all the new index values into the
existing index records.
Run the code in Listings 1 and 2 to create
a copy of the Sales.SalesOrderHeader table
and build four indexes on the table. Listing 2 recopies the original table into the new table
to double the number of rows to 62,930.
When I look at the plan for the UPDATE
query in Listing 3, I see a single clustered-index–scan operation (which is the same as
a table scan) at the far right of the graphical
plan and a clustered-index update as the final
step on the left, which represents 91 percent
of the query's cost. This operator performs all
the modifications to all the indexes by using
the row-at-a-time technique.
If I run the code in Listing 2 again, I'll
increase the table's size to almost 100,000
rows. When I now look at the graphical plan for the UPDATE in Listing 3, it looks
different. I see several icons indicating index-update operations and several spool operators that indicate SQL Server is collecting
the index key values in a work table before
sorting them. Web Figure 1 shows a
portion of the plan containing the spooling,
sorting, and index-at-a-time updates. I can't
fully explain each operator in your plans here,
but I want to show plans using each of the
types of bulk operators (i.e., row-at-a-time
and index-at-a-time). Depending on the
resources available on your system, you might
find that you need to run Listing 2 a different
number of times to get the index-at-a-time
plan. My technical editor got the index-at-a-time plan the first time he ran Listing 2.
But when I retested this code on another
SQL Server machine, I had to run Listing 2 four times.
Whether SQL Server chooses to use
row-at-a-time or index-at-a-time operations for insert and update operations
depends on various factors, including the
total number of rows modified, percentage
of existing rows affected by the change, and
available system resources. I don't mean to
suggest that you should never drop your
indexes prior to bulk loads or bulk updates,
just that you shouldn't assume it's always
better to do so. You need to test your operations on your SQL Server systems with your
data. (For more information and examples
of testing to determine at what point
dropping the indexes before modifying
data might be cheaper, see the case study
at http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx.)
Little Surprises
I've talked about three changes in SQL
Server behavior that haven't been well
advertised. As you can imagine, these examples aren't the only areas in which SQL
Server has changed the way it behaves from
version to version. The moral: Be prepared
to be "pleasantly" surprised whenever you
investigate behavior after any SQL Server
upgrade, whether to an entirely new version
or even to a service pack.
End of Article