July 26, 2006 01:26 PM

Quaere Verum - Clustered Index Scans - Part I

Rating: (3)
SQL Server Magazine
InstantDoc ID #92886

Did it ever happen to you that you believed in something so strongly
without ever bothering to verify the truth about it because it seemed so
obvious, and one day realized that you were wrong? Well, it happened to me
recently…

Before I unveil the details of the case, I’d like to express my deepest
gratitude to Lubor Kollar and Srikumar Rangarajan who helped me get to
the bottom of things and for explaining the behavior I couldn’t explain myself.
Lubor also gave me great guidance and advice!

Tested on SQL Server versions: 2000 Dev/SP4, 2005 Dev/SP1

I’ll start by asking you a seemingly simple question. Given a table T1
with a clustered index on col1, is the following query guaranteed to
return the data in clustered index order?

SELECT * FROM T1;

Try to answer the question to yourself, and also to explain the
reasoning behind your answer.
Apparently, the answer is far from being simple, and the implications
of what really happens are profound.

Bear with me as I go through some crucial fundamentals that you
may already be familiar with before I even get to the main points I
want to make and before describing the recent findings.

I’d like to stress the ANSI SQL standard’s stand on the matter. As
far as the standard is concerned, such a query returns a set, and
since a set has no order to its rows, the result is considered valid
regardless of the order in which the rows are returned. If you want
to guarantee that the result would be returned in a particular order
you MUST specify an ORDER BY clause.
The implications in regards to SQL Server are that it can utilize
unordered or ordered access methods as it deems fit.

I’d like to make some observations in regards to the common beliefs
out there, and disprove some myths.

Clustered index physically organizes the data in order
(myth/false assumption)
Some people believe that creating a clustered index on a table
causes the data to be physically organized on disk by index key
order.
The term “physical” itself is problematic since table/index data
resides in pages, which reside in extents, which reside in files. Files
are not necessarily contiguous on disk due to file system
fragmentation, not to speak of a file being placed on a RAID array,
and/or having multiple files in a database. But for the sake of the
discussion, I’ll make things simple; I’ll work with a database that has
only a single data file placed on a single drive, and ignore file system
fragmentation; that is, I’ll assume that the file is a single contiguous
unit.
Note that even the records within a page are not necessarily sorted
physically. SQL Server organizes the records within the page in a
sorted fashion by means of a row-offset array that it maintains at the
end of each page.
I’ll demonstrate that the pages at the leaf level of the clustered index
are not necessarily organized within the file in the same order as the
logical index key order (maintained by the index leaf level’s linked
list). To do so, I’ll create a table called T1 with a clustered index on
col1; I’ll populate the table with a loop that inserts a row in each
iteration, with a random value in col1:

SET NOCOUNT ON;
USE master;
GO
IF DB_ID('testdb') IS NULL
CREATE DATABASE testdb;
GO
USE testdb;
GO
IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL
DROP TABLE dbo.T1;
GO
CREATE TABLE dbo.T1
(
col1 INT NOT NULL,
filler CHAR(2000) NOT NULL DEFAULT('a')
);
CREATE CLUSTERED INDEX idx_cl_col1 ON dbo.T1(col1);
GO

DECLARE @i AS INT;
SET @i = 1;
WHILE @i BEGIN
INSERT INTO dbo.T1(col1)
VALUES(1 + ABS(CHECKSUM(NEWID()) % 1000));
SET @i = @i + 1;
END

I’m using random values so that the rows will be entered into the
clustered index in random order, causing page splits which in turn
cause logical fragmentation. Logical fragmentation is expressed as
the percentage of out-of-order pages; that is, the discrepancy
between the logical order dictated by the index linked list vs. the
order in which the pages appear in the file.
The premise of the myth that a clustered index physically organizes
the data in order is disproved by the fact that logical fragmentation
exists; if you think about it, another way to express the myth is that
there’s no such thing as logical fragmentation, which is of course
wrong.
The following code run in SQL Server 2005 returns the level of
logical fragmentation in the clustered index (in SQL Server 2000 use
DBCC SHOWCONTIG):

SELECT avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats
(
DB_ID('testdb'),
OBJECT_ID('dbo.T1'),
1,
NULL,
NULL
);

I got the result 96.5217391304348, meaning that there are over 96
percents of out-of-order pages. If you need further proof that the
pages do not necessarily reside in the file in the same order as in the
linked list, use DBCC IND to show the actual pointers between the
pages in the linked list:

DBCC IND('testdb', 'dbo.T1', 0);

To save you the trouble in deciphering the output, the following code
generates a string with the addresses of the pages in the order in
which they appear in the linked list:

CREATE TABLE #DBCCIND
(
PageFID INT,
PagePID INT,
IAMFID INT,
IAMPID INT,
ObjectID INT,
IndexID INT,
PartitionNumber INT,
PartitionID BIGINT,
iam_chain_type VARCHAR(100),
PageType INT,
IndexLevel INT,
NextPageFID INT,
NextPagePID INT,
PrevPageFID INT,
PrevPagePID INT
);

INSERT INTO #DBCCIND
EXEC ('DBCC IND(''testdb'', ''dbo.T1'', 0)');

WITH LinkedList
AS
(
SELECT 1 AS RowNum, PageFID, PagePID
FROM #DBCCIND
WHERE IndexID = 1
AND IndexLevel = 0
AND PrevPageFID = 0
AND PrevPagePID = 0

UNION ALL

SELECT PrevLevel.RowNum + 1,
CurLevel.PageFID, CurLevel.PagePID
FROM LinkedList AS PrevLevel
JOIN #DBCCIND AS CurLevel
ON CurLevel.PrevPageFID = PrevLevel.PageFID
AND CurLevel.PrevPagePID = PrevLevel.PagePID
)
SELECT
CAST(PageFID AS VARCHAR(MAX)) + ':'
+ CAST(PagePID AS VARCHAR(MAX)) + ' ' AS [text()]
FROM LinkedList
ORDER BY RowNum
FOR XML PATH('')
OPTION (MAXRECURSION 0);

DROP TABLE #DBCCIND;

Here’s the output that I got:

1:109 1:1567 1:1531 1:1488 1:1541 1:1508 1:1523 1:1558 
1:1529 1:1501 1:1570 1:1536 1:1569 1:1491 1:1525 1:1539 
1:1555 1:1486 1:1480 1:1571 1:1526 1:80 1:1527 1:1528 
1:1557 1:1502 1:1548 1:1587 1:1510 1:1498 1:1483 1:1568 
1:1554 1:1500 1:1519 1:1564 1:1530 1:1489 1:1575 1:1524 
1:1562 1:1550 1:41 1:1583 1:1534 1:1549 1:1492 1:1566 
1:1540 1:1515 1:1538 1:1551 1:1503 1:1490 1:1504 1:1559 
1:174 1:1560 1:1514 1:89 1:1537 1:1563 1:1495 1:1509 
1:1580 1:1505 1:1496 1:1542 1:1556 1:1573 1:1512 1:1577 
1:1533 1:1485 1:1494 1:1520 1:1521 1:1547 1:1543 1:1516 
1:1565 1:1497 1:1552 1:1578 1:1579 1:1581 1:1522 1:110 
1:1586 1:1532 1:1507 1:1576 1:1481 1:1518 1:1487 1:1582 
1:1506 1:1544 1:1511 1:1572 1:1553 1:77 1:1574 1:1546 
1:1585 1:1517 1:1482 1:1493 1:1513 1:1499 1:1561 1:1535 
1:1484 1:1584 1:1545

You can clearly see that there are many out of order pages (next
page in linked list appears earlier in the file).

Note that when you create or rebuild an index on an existing table,
SQL Server will make effort to create it in a contiguous manner
(least amount of fragmentation), but there are no
guarantees—especially when the SORT_IN_TEMPDB option is
not specified or the operation is processed in parallel. When you do
specify the SORT_IN_TEMPDB option and the index
creation/rebuild is processed with a single thread, this may improve
the continuity of index extents; but again, no guarantees that the
operation will end up with an index that has no fragmentation at all.
Here’s a snippet from Books Online that explains this:

“The SORT_IN_TEMPDB option may improve the contiguity of
index extents, especially if the CREATE INDEX operation is not
being processed in parallel. The sort work area extents are freed
on a somewhat random basis with regard to their location in the
database. If the sort work areas are contained in the destination
filegroup, as the sort work extents are freed, they can be
acquired by the requests for extents to hold the index structure
as it is built. This can randomize the locations of the index
extents to a degree. If the sort extents are held separately in
tempdb, the sequence in which they are freed has no affect on
the location of the index extents. Also, when the intermediate
sort runs are stored in tempdb instead of the destination
filegroup, there is more space available in the destination
filegroup. This increases the chances that index extents will be
contiguous.”

Conclusion: having a clustered index on a table does not guarantee
that the data is organized within the file by index key order.

Continues in Part II...

Add a Comment

Very good article

Faisal11/7/2006 9:52:09 AM


One of the Best article that I ever read.

Thanks & Regards
Rajesh Peddireddy

venkata7/31/2006 9:17:21 PM


You must log on before posting a comment.

Are you a new visitor? Register Here

Rename Virtual Server (Cluster)

I am going to rename a clustered instance (default).1. Change SQL Network Name2. Take offline3. Bring online4. Flush DNS, Cache5. Test failover6. Rebu...222-96209

GOOGLE LINKS
SPONSORED LINKS
FEATURED LINKS