DBAs often have to update rows in tables. When
the table is small, you might be able to update
the entire table with a simple statement that looks
something like
UPDATE dbo.SomeSmallTable
SET SomeColumn = 'something'
For large tables,
however, such
a simplistic approach
is impractical
because the
transaction size
will lock out
users.
When you are
faced with updating
most or all
rows in a large table
and you want
to avoid locking out users, you can use the TOP clause
to limit the transaction size. However, there’s a trick to
using this clause: You need to keep track of which rows
have already been updated so that you don’t update
them a second time.
The OUTPUT clause in SQL Server 2005 and
later provides a way to expose which rows are affected by a Data Manipulation Language (DML) statement. In SQL Server 2005, you can use the OUTPUT
clause with the UPDATE, INSERT, and DELETE
statements. In SQL Server 2008, you can also use
the OUTPUT clause with MERGE statements. You
declare a local table to hold the primary key values of
the table being updated, then use the OUTPUT clause
to capture the primary key values for the rows that are
updated. Listing 1 shows what this code looks like.
For the WHILE loop to start, the @@ROWCOUNT
function must return a value greater than 0. As callout
A in Listing 1 shows, you can trick the WHILE loop
into initially executing by including a meaningless SET
statement right before the WHILE condition. After that,
the WHILE condition is dependent on the UPDATE
statement’s row count. The WHERE clauses in callout
B prevent the same row from being updated twice.
The declared table has a clustered unique index (by
virtue of the primary key declaration), so the UPDATE
operation’s performance should be acceptable. You can
tweak its performance by changing the TOP value—
the larger the value, the quicker the update process. The
tradeoff is that you’re locking more rows for a longer
period of time with a larger TOP value.
—Lawrence Rogers, senior consultant,
Daugherty Business Solutions