Did it ever happen to you that you believed in something so strongly
without
ever bothering to verify the truth about it because it seemed so
obvious, and
one day realized that you were wrong? Well, it happened to me
recently…
Before I unveil the details of the case, I’d like to express my deepest
gratitude to Lubor Kollar and Srikumar Rangarajan who helped me get to
the
bottom of things and for explaining the behavior I couldn’t explain myself.
Lubor also gave me great guidance and advice!
Tested on SQL Server versions: 2000 Dev/SP4, 2005 Dev/SP1
I’ll start by asking you a seemingly simple question. Given a table T1
with a
clustered index on col1, is the following query guaranteed to
return the data in
clustered index order?
SELECT * FROM T1;
Try to answer the question to yourself, and also to explain the
reasoning
behind your answer.
Apparently, the answer is far from being simple, and the implications
of what
really happens are profound.
Bear with me as I go through some crucial fundamentals that you
may already
be familiar with before I even get to the main points I
want to make and before
describing the recent findings.
I’d like to stress the ANSI SQL standard’s stand on the matter. As
far as the
standard is concerned, such a query returns a set, and
since a set has no order
to its rows, the result is considered valid
regardless of the order in which the
rows are returned. If you want
to guarantee that the result would be returned in
a particular order
you MUST specify an ORDER BY clause.
The implications in regards to SQL Server are that it can utilize
unordered or
ordered access methods as it deems fit.
I’d like to make some observations in regards to the common beliefs
out
there, and disprove some myths.
Clustered index physically organizes the data in order
(myth/false
assumption)
Some people believe that creating a clustered index on a table
causes the
data to be physically organized on disk by index key
order.
The term “physical” itself is problematic since table/index data
resides in
pages, which reside in extents, which reside in files. Files
are not necessarily
contiguous on disk due to file system
fragmentation, not to speak of a file
being placed on a RAID array,
and/or having multiple files in a database. But
for the sake of the
discussion, I’ll make things simple; I’ll work with a
database that has
only a single data file placed on a single drive, and ignore
file system
fragmentation; that is, I’ll assume that the file is a single
contiguous
unit.
Note that even the records within a page are not necessarily sorted
physically.
SQL Server organizes the records within the page in a
sorted fashion by means of
a row-offset array that it maintains at the
end of each page.
I’ll demonstrate that the pages at the leaf level of the clustered index
are not
necessarily organized within the file in the same order as the
logical index key
order (maintained by the index leaf level’s linked
list). To do so, I’ll create
a table called T1 with a clustered index on
col1; I’ll populate the table with a
loop that inserts a row in each
iteration, with a random value in col1:
SET NOCOUNT ON;
USE master;
GO
IF DB_ID('testdb') IS NULL
CREATE DATABASE testdb;
GO
USE testdb;
GO
IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL
DROP TABLE dbo.T1;
GO
CREATE TABLE dbo.T1
(
col1 INT NOT NULL,
filler CHAR(2000) NOT NULL DEFAULT('a')
);
CREATE CLUSTERED INDEX idx_cl_col1 ON dbo.T1(col1);
GO
DECLARE @i AS INT;
SET @i = 1;
WHILE @i <= 300
BEGIN
INSERT INTO dbo.T1(col1)
VALUES(1 + ABS(CHECKSUM(NEWID()) % 1000));
SET @i = @i + 1;
END
I’m using random values so that the rows will be entered into the
clustered
index in random order, causing page splits which in turn
cause logical
fragmentation. Logical fragmentation is expressed as
the percentage of
out-of-order pages; that is, the discrepancy
between the logical order dictated
by the index linked list vs. the
order in which the pages appear in the file.
The premise of the myth that a clustered index physically organizes
the data in
order is disproved by the fact that logical fragmentation
exists; if you think
about it, another way to express the myth is that
there’s no such thing as
logical fragmentation, which is of course
wrong.
The following code run in SQL Server 2005 returns the level of
logical
fragmentation in the clustered index (in SQL Server 2000 use
DBCC SHOWCONTIG):
SELECT avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats
(
DB_ID('testdb'),
OBJECT_ID('dbo.T1'),
1,
NULL,
NULL
);
I got the result 96.5217391304348, meaning that there are over 96
percents of
out-of-order pages. If you need further proof that the
pages do not necessarily
reside in the file in the same order as in the
linked list, use DBCC IND to show
the actual pointers between the
pages in the linked list:
DBCC IND('testdb', 'dbo.T1', 0);
To save you the trouble in deciphering the output, the following code
generates a string with the addresses of the pages in the order in
which they
appear in the linked list:
CREATE TABLE #DBCCIND
(
PageFID INT,
PagePID INT,
IAMFID INT,
IAMPID INT,
ObjectID INT,
IndexID INT,
PartitionNumber INT,
PartitionID BIGINT,
iam_chain_type VARCHAR(100),
PageType INT,
IndexLevel INT,
NextPageFID INT,
NextPagePID INT,
PrevPageFID INT,
PrevPagePID INT
);
INSERT INTO #DBCCIND
EXEC ('DBCC IND(''testdb'', ''dbo.T1'', 0)');
WITH LinkedList
AS
(
SELECT 1 AS RowNum, PageFID, PagePID
FROM #DBCCIND
WHERE IndexID = 1
AND IndexLevel = 0
AND PrevPageFID = 0
AND PrevPagePID = 0
UNION ALL
SELECT PrevLevel.RowNum + 1,
CurLevel.PageFID, CurLevel.PagePID
FROM LinkedList AS PrevLevel
JOIN #DBCCIND AS CurLevel
ON CurLevel.PrevPageFID = PrevLevel.PageFID
AND CurLevel.PrevPagePID = PrevLevel.PagePID
)
SELECT
CAST(PageFID AS VARCHAR(MAX)) + ':'
+ CAST(PagePID AS VARCHAR(MAX)) + ' ' AS [text()]
FROM LinkedList
ORDER BY RowNum
FOR XML PATH('')
OPTION (MAXRECURSION 0);
DROP TABLE #DBCCIND;
Here’s the output that I got:
1:109 1:1567 1:1531 1:1488 1:1541 1:1508 1:1523 1:1558
1:1529 1:1501 1:1570 1:1536 1:1569 1:1491 1:1525 1:1539
1:1555 1:1486 1:1480 1:1571 1:1526 1:80 1:1527 1:1528
1:1557 1:1502 1:1548 1:1587 1:1510 1:1498 1:1483 1:1568
1:1554 1:1500 1:1519 1:1564 1:1530 1:1489 1:1575 1:1524
1:1562 1:1550 1:41 1:1583 1:1534 1:1549 1:1492 1:1566
1:1540 1:1515 1:1538 1:1551 1:1503 1:1490 1:1504 1:1559
1:174 1:1560 1:1514 1:89 1:1537 1:1563 1:1495 1:1509
1:1580 1:1505 1:1496 1:1542 1:1556 1:1573 1:1512 1:1577
1:1533 1:1485 1:1494 1:1520 1:1521 1:1547 1:1543 1:1516
1:1565 1:1497 1:1552 1:1578 1:1579 1:1581 1:1522 1:110
1:1586 1:1532 1:1507 1:1576 1:1481 1:1518 1:1487 1:1582
1:1506 1:1544 1:1511 1:1572 1:1553 1:77 1:1574 1:1546
1:1585 1:1517 1:1482 1:1493 1:1513 1:1499 1:1561 1:1535
1:1484 1:1584 1:1545
You can clearly see that there are many out of order pages (next
page in
linked list appears earlier in the file).
Note that when you create or rebuild an index on an existing table,
SQL
Server will make effort to create it in a contiguous manner
(least amount of
fragmentation), but there are no
guarantees—especially when the SORT_IN_TEMPDB
option is
not specified or the operation is processed in parallel. When you do
specify the SORT_IN_TEMPDB option and the index
creation/rebuild is processed
with a single thread, this may improve
the continuity of index extents; but
again, no guarantees that the
operation will end up with an index that has no
fragmentation at all.
Here’s a snippet from Books Online that explains this:
“The SORT_IN_TEMPDB option may improve the contiguity of
index extents,
especially if the CREATE INDEX operation is not
being processed in parallel. The
sort work area extents are freed
on a somewhat random basis with regard to their
location in the
database. If the sort work areas are contained in the
destination
filegroup, as the sort work extents are freed, they can be
acquired
by the requests for extents to hold the index structure
as it is built. This can
randomize the locations of the index
extents to a degree. If the sort extents
are held separately in
tempdb, the sequence in which they are freed has no
affect on
the location of the index extents. Also, when the intermediate
sort
runs are stored in tempdb instead of the destination
filegroup, there is more
space available in the destination
filegroup. This increases the chances that
index extents will be
contiguous.”
Conclusion: having a clustered index on a table does not guarantee
that the
data is organized within the file by index key order.
Continues in Part II...
End of Article