Subscribe to SQL Server Magazine | See More SQL Server and Database Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!
back to blog index

Why does SQL Server tend to use allocation order scans when
NOLOCK or TABLOCK is specified?

With the TABLOCK hint it’s obvious—when the table is fully
locked, no data changes are allowed so no data movement is
possible, so it’s safe to use allocation order scans and still guarantee
consistency.
There might be cases where based on cost estimations SQL Server
proactively decides to acquire a table or index lock (aka rowset
lock) without a hint. In those cases, the implications are the same as
when specifying the TABLOCK hint. You can use a trace to figure
out what kinds of locks were taken.

With the NOLOCK hint (or setting the isolation level of the session
to READ UNCOMMITTED) you tell SQL Server that you don’t
expect consistency, so there are no guarantees.
Bear in mind though that “inconsistent data” does not only mean that
you might see uncommitted changes that were later rolled back, or
data changes in an intermediate state of the transaction. It also means
that in a simple query that scans all table/index data SQL Server may
lose the scan position, or you might end up getting the same row
twice.

I’d like to make a short side note before demonstrating inconsistency
issues.
In SQL Server 2000, my tests showed that allocation order scans
were used when NOLOCK or TABLOCK were specified
regardless of table size. However, in SQL Server 2005, even with
the hints, index order scans were used up to a table size of 64 pages;
from this point and beyond allocation order scans were used when
one of the hints was specified. The reasoning behind the change in
SQL Server 2005 is that the cost of initiating an unordered scan is
quite high. Unordered scans are performed on big indexes where the
hope is that the benefit outweighs the cost. To see this yourself, you
can play with the table size by revising the number of iterations of the
loop in the script I provided to create and populate T1. With 300
iterations the table size is in the area of 100 pages. If you change the
number of iteration to a small enough value (e.g., 100), the table size
will be smaller than 64 pages and then SQL Server 2005 will
perform an index order scan even when you specify the NOLOCK
or TABLOCK hints.

Next, I’ll demonstrate that you can get the same row twice when
using the NOLOCK hint. I’ll recreate the table T1 such that the
clustered index on col1 will be defined as a unique index with the
option IGNORE_DUP_KEY. This means that duplicate values
cannot exist in col1, and also that an attempt to insert a duplicate key
will not fail the transaction and generate an error rather just generate
a warning. I’ll insert rows with random values in col1 in an infinite
loop that breaks as soon as it gets a signal from another session. The
signal is in the form of non-existence of a global temporary table
called ##DupsNotFound created earlier. From another session, in an
infinite loop, I’ll query T1 with the NOLOCK hint, copying the data
aside to a temporary table called #T. I’ll check if there’s any col1
value that appears more than once in the temp table (meaning that the
same row was read more than once from T1), and if there is, I’ll
drop the global temporary table ##DupsNotFound. The
non-existence of the global temporary table ##DupsNotFound is a
signal to both sessions to break from the infinite loop since we
confirmed our suspicions.

From one session run the following code to recreate T1 and insert
rows:

SET NOCOUNT ON;
USE testdb;
GO
IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL
DROP TABLE dbo.T1;
GO
CREATE TABLE dbo.T1
(
col1 INT NOT NULL,
filler CHAR(2000) NOT NULL DEFAULT('a')
);
CREATE UNIQUE CLUSTERED INDEX idx_cl_col1 ON dbo.T1(col1)
WITH IGNORE_DUP_KEY;
GO

IF OBJECT_ID('tempdb..##DupsNotFound', 'U') IS NOT NULL
DROP TABLE ##DupsNotFound;
GO
CREATE TABLE ##DupsNotFound(col1 INT);
GO

WHILE OBJECT_ID('tempdb..##DupsNotFound', 'U') IS NOT NULL
INSERT INTO dbo.T1(col1)
SELECT 1 + ABS(CHECKSUM(NEWID()) % 1000000000);

From another session run the following code in text output mode to
read the data from T1 and check whether the same row was read
more than once:

SET NOCOUNT ON;
USE testdb;
GO
WAITFOR DELAY '00:00:05';

WHILE OBJECT_ID('tempdb..##DupsNotFound', 'U') IS NOT NULL
BEGIN
IF OBJECT_ID('tempdb..#T', 'U') IS NOT NULL
DROP TABLE #T;

SELECT col1 INTO #T
FROM dbo.T1 WITH (NOLOCK);

SELECT col1, COUNT(*) AS cnt
FROM #T
GROUP BY col1
HAVING COUNT(*) > 1;

IF @@ROWCOUNT > 0
DROP TABLE ##DupsNotFound;
END

You may have guessed that the reason I entered a delay of 5
seconds is to allow T1 to grow big enough so that SQL Server will
consider an allocation order scan.
After a few seconds I got the following result from the last iteration of
the loop:

col1        cnt
----------- -----------
782866256 2
783744406 2

As you can see, there are two rows that were read twice due to
page splits. This is proof that a query with the NOLOCK hint may
return the same row more than once, and brings a whole new
perspective to the meaning of inconsistent reads.

Is there a difference in performance between allocation order
scans and index order scans?

This depends on the level of logical fragmentation of the index. The
higher is the fragmentation, the faster is an allocation order scan than
an index order scan.

To demonstrate this, first run the following code to recreate and
populate T1 with 50,000 rows. It should take a few minutes for the
script to run. The script inserts a row at a time (after creating the
index) with random col1 values in the range 1 through 1,000,000,
introducing a high level of fragmentation:

SET NOCOUNT ON;
USE master;
GO
IF DB_ID('testdb') IS NULL
CREATE DATABASE testdb;
GO
USE testdb;
GO
IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL
DROP TABLE dbo.T1;
GO
CREATE TABLE dbo.T1
(
col1 INT NOT NULL,
filler CHAR(2000) NOT NULL DEFAULT('a')
);
CREATE CLUSTERED INDEX idx_cl_col1 ON dbo.T1(col1);
GO

DECLARE @i AS INT;
SET @i = 1;
WHILE @i <= 50000
BEGIN
INSERT INTO dbo.T1(col1)
VALUES(1 + ABS(CHECKSUM(NEWID()) % 1000000));
SET @i = @i + 1;
END

Check the level of fragmentation:

SELECT avg_fragmentation_in_percent FROM 
sys.dm_db_index_physical_stats
(
DB_ID('testdb'),
OBJECT_ID('dbo.T1'),
1,
NULL,
NULL
);

I got over 99 percents of fragmentation.

Turn on the “Discard results after execution” in SSMS so that the
time it takes to generate the output will not be taken into
consideration.
Run the following code to perform an index order scan after clearing
the data cache:

DBCC DROPCLEANBUFFERS;
SELECT * FROM dbo.T1;

This code ran on my system for 33 seconds.
Next, run the following code to perform an allocation order scan
using the NOLOCK hint:

DBCC DROPCLEANBUFFERS;
SELECT * FROM dbo.T1 WITH (NOLOCK);

This code ran on my system for 11 seconds; 3 times faster than an
index order scan!
Run the following code to perform an allocation order scan using the
TABLOCK hint:

DBCC DROPCLEANBUFFERS;
SELECT * FROM dbo.T1 WITH (TABLOCK);

This code ran on my system for 11 seconds—same as with the
NOLOCK hint.

Next, run the following code to rebuild the index, and minimize the
level of logical fragmentation:

ALTER INDEX idx_cl_col1 ON dbo.T1 REBUILD
WITH (SORT_IN_TEMPDB = ON, MAXDOP = 1);

Rerun the performance tests. Now I got 7 seconds for the index
order scan (no hint specified), and 7 seconds for the allocation order
scans (NOLOCK or TABLOCK hints).

Conclusions and Best Practices

The most important advice that I hope that you will carry after
reading this blog entry goes back to the standard. ANSI SQL says
that a query without an ORDER BY clause is not guaranteed to
return the data in any particular order. If you want to guarantee that
the data will be returned in a particular order, specify an ORDER BY
clause. Regardless of what you know or think you know about the
internals of SQL Server this should be your work-premise. There
may be access methods that you’re not aware of that are used in
special circumstances, and new access methods might be introduced
in future versions of SQL Server or even future service pack levels.

Another lesson I learned from this experience is that I should never
trust blindly something without testing it, and this includes things that
might seem simple and obvious.

As for more technical conclusions and best practices…

* Creating a clustered index on a table does not guarantee that the
data is stored in the file in index key order. Data movement caused
by page splits, changing index key values, and expanding dynamic
columns generate logical fragmentation. When you create or rebuild
an index on an existing table, SQL Server will make effort to create
it in a contiguous manner (least amount of fragmentation), but there
are no guarantees.

* Allocation order scans of data in the leaf level of an index are
considered when you specify NOLOCK or TABLOCK (or when
SQL Server decides to take a rowset lock proactively for cost-based
reasons); in all other cases SQL Server uses index order scans.

* Be aware of the implications of reading data with the NOLOCK
hint (or in a read uncommitted isolation). It’s not just a matter of
reading uncommitted changes, or data in an intermediate state of the
transaction; rather you might even get the same row twice, or SQL
Server might lose the scan position during the scan.

* In a clustered table, the higher is the level of logical fragmentation,
the faster is an allocation order scan than an index order scan.
Similarly a table scan against a clustered table can be slower than a
table scan against a heap; that’s the case when the index is
fragmented and SQL Server used an index order scan.
Hopefully you WILL NOT be hasty to conclude the following:
• That you should get rid of all of your clustered indexes and from
now on work with heaps; there are many advantages to using
clustered indexes but that’s a different discussion
• That from now on you should use NOLOCK in all your queries;
remember that NOLOCK has serious inconsistency issues
• That from now on you should use TABLOCK in all your queries;
though this guarantees consistent data and a faster scan, it can also
cause serious blocking issues

--
BG

End of Article



Post Your Comments Here

You must be a registered user or online subscriber to comment on this article. Please log on before posting a comment. Are you a new visitor? Register now

Reader Comments

Good Analysis of the indexing and sql reading techniques. A fine insite on the analysis made given in the conclusion. Thanks.

anand_rajappan

Article Rating 4 out of 5

 



  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28       
or
More blogs about technology, databases, and SQL Server.
 

ADS BY GOOGLE