Last month, in SQL Server Read-Consistency
Problems, Part 1” (InstantDoc ID 97278), I began
a discussion about index scans and read-consistency
problems that can develop when one process is
reading data under the Read Uncommitted isolation while
another process is inserting data. This month, I’ll provide
tangible examples that demonstrate those problems and
also suggest workarounds. Be sure to read last month’s
article as a prerequisite to this one.
Seeing Is Believing
I’ll first demonstrate how an allocation-order scan can read
the same row multiple times. Run the code in Listing 1 to
create a database called testdb and within it a table called
T1. The T1 table has two columns: cl_col is a UNIQUEIDENTIFER
column with a default value NEWID(),
and filler is a CHAR(2000) column with a default value
of a. The table has
a unique clustered
index on cl_col.
With this row size,
a page will contain
four rows at most. Each insert will enter a GUID in cl_col,
and because of the random nature of GUIDs, frequent
insertions will cause frequent splits in the clustered index.
Open two new connections (call them connection1 and
connection 2). Run the code in Listing 2, page 30, in connection
1, then the code in Listing 3, page 30, in connection
2. The code running in connection 1 simply inserts new
rows into T1, frequently causing splits. The code running
in connection 2 has a loop that selects all T1’s data into a
temporary table and checks whether any rows were read
more than once. The SELECT statement against T1 uses
the NOLOCK hint, and if you examine the plan for this
SELECT INTO statement, you’ll see a clustered index scan
with the Ordered: False property. Remember, this is the
scenario in which the storage engine might decide to use an
allocation-order scan. As soon as the IF EXISTS statement
finds multiple occurrences of rows, the code breaks from the
loop and returns the GUIDs that SQL Server read multiple
times.
Table 1, page 31, shows sample output I got in connection
2 after a few seconds. Remember that the clustered
index on cl_col is unique, so there’s no doubt that SQL
Server read the same rows multiple times. Once you get
output from connection 2, you can stop the code running
in connection 1 or run the code in connection 2 again if
you want to make sure that the first test wasn’t a fluke.
The second test will demonstrate an allocation-order
scan that misses existing rows. Run the code in Listing 4,
page 30, to recreate T1 with an additional column that will
hold an incrementing sequence value, and the Sequence
table that will be used to produce the sequence values. Run
the code in Listing 5, page 30, in connection 1, then the
code in Listing 6, page 31, in connection 2.
The code running in connection 1 has a loop that produces,
in each iteration, a new sequence value (i.e., greater
than the previous by 1) and inserts a new row into T1. As
in the previous test, the inserts will cause frequent splits.
The code running in connection 2 has a loop that selects
all T1’s data into a temporary table and checks whether
any sequence values are missing. Again, the SELECT INTO statement runs with the NOLOCK hint and will
show up in the plan as an index scan with the Ordered:
False property. Namely, the storage engine might use an
allocation-order scan in this case. Once the IF EXISTS
statement finds missing sequence values, the code returns
the pairs of adjacent sequence values with a gap between
them, proving that the scan skipped sequence values in the
table. Table 2 shows sample output I got in connection 2
after a few seconds. Once you get output from connection
2, you can stop the code running in connection 1.
On Performance of Index Scans
Allocation-order scans are virtually always preferable to
index-order scans in terms of performance. However, read
consistency might very well be higher in priority. With zero
logical fragmentation of an index, both should perform
similarly; however, with high levels of fragmentation,
index-order scans can get much slower. I discussed the
cases in which the storage engine uses unsafe allocationorder
scans. There are cases, however, when the storage
engine can utilize the faster allocation-order scans when
the plan shows Clustered Index Scan or Index Scan with
Ordered: False without sacrificing read consistency:
- TABLOCK—When you query data with the TABLOCK
hint, no one can insert data while you read. The storage
engine can safely use an allocation-order scan in this
case. Of course, you have to consider the implications on processes trying to modify the table data while you read.
- READ_ONLY—When the index being scanned resides
in a READ_ONLY file group or database, again, since
the storage engine realizes that no one can modify data
during the read, it can safely use an allocation-order
scan. This is something that you have control over. If you
have tables that are modified periodically, you can place
them in a READ_ONLY file group and temporarily
set the file group to READ_WRITE allow the periodic
modifications. This way, you’ll get the faster allocation-
order scans. Similarly, a database holding a data
warehouse or data mart can be set to READ_ONLY
during the day and changed to READ_WRITE temporarily
during the Extract, Transform, and Load (ETL)
processes loading the daily data.
A Read Uncommitted/NOLOCK
Alternative
A common Read Uncommitted scenario involves systems
that need to support very frequent inserts, and readers
keep getting blocked when trying to acquire shared
locks. And if readers manage to obtain shared locks,
they interfere with writers. You might have thought that
individual inserts—each run as an independent transaction—
wouldn’t cause consistency problems under Read
Uncommitted, but now you know the truth. So, is your
environment doomed to either suffer from the read-consistency
problems described in this article or suffer from frequent
blocking problems and slow running index scans?
There might be a reasonable alternative in SQL Server 2005 using
the Read Committed Snapshot isolation. This isolation is similar in
concept to Read Committed, except that it’s based on row versioning
and not shared locks for readers. If your environment mainly has
frequent reads and inserts with infrequent updates and deletes, it
might be a good candidate for Read Committed Snapshot. Deletes
and updates get versioned, but inserts don’t, so in such a system there
will be little overhead on the version store in the tempdb database.
The storage engine will utilize index order scans for clustered index
scans and index scans with the Ordered: False property, so you won’t
face the consistency problems that this series of articles describes. You
can mitigate the performance problems with such scans by keeping
your indexes defragmented, and when you rebuild indexes as part of
your maintenance process, you can specify a fill factor to reduce the
occurrences of splits. In this isolation, readers won’t get blocked by
writers, and writers won’t get blocked by readers. This is an option
that’s definitely worthwhile to test in your system as an alternative to
Read Uncommitted.
On to the Next Step
You
now know about read-consistency problems that can occur when reading
data under the Read Uncommitted isolation when processes insert data while
an allocation order scan of an index is in progress. Next month, I’ll describe
other read-consistency problems that can happen even with index order scans
in both Read Uncommitted and Read Committed isolations.