SideBar    The Logical Puzzle

These read-consistency problems have to do with changes in the storage engine choices when instructed by the relational engine (specifically by the optimizer) to perform index scans (both clustered and nonclustered) with Ordered: False.

The optimizer will use an index scan whenever all the index leaf data is needed. When the data isn’t required to be scanned in order, the Index Scan operator would show Ordered: False in the tooltip box. When the data is required to be scanned in order (e.g., to satisfy an ORDER BY request, GROUP BY, DISTINCT, merge join), the Index Scan operator would show Ordered: True. As an example, examine the execution plans that Web Figure 1 shows for the following three queries:

USE AdventureWorks;
-- Query 1
SELECT * FROM Sales.SalesOrderHeader;
-- Query 2
SELECT SalesOrderNumber FROM Sales. SalesOrderHeader;
-- Query 3
SELECT * FROM Sales.SalesOrderHeader ORDER BY SalesOrderID;

Observe that the first two index scans (i.e., first clustered, second nonclustered) show Ordered: False, whereas the third shows Ordered: True. With the first two scans, as far as the optimizer is concerned, the data can be returned in any order, but with the third scan, it needs to be returned in index key order.

Ordered: False doesn’t necessarily mean that the storage engine in charge of the actual row operations (fetching the data) will utilize the more efficient allocation- order scans based on IAM pages. The storage engine needs to also take into account your consistency expectations based on the isolation level the query runs under. An allocation-order scan running while inserts are issued against the table/index might result in getting the same row multiple times or skipping rows.

As an example of getting the same row multiple times, suppose you have the following index keys in pages of an index shown in file order: Page#101(keys: 50, 60, 70, 80), Page#107(keys: 10, 20, 30, 40), Page#121(keys: 90, 100, 110, 120). The logical index order of the pages is Page#107(keys: 10, 20, 30, 40)-Page#101(keys: 50, 60, 70, 80)-Page#121(keys: 90, 100, 110, 120). Suppose an allocation-order scan starts and reads pages 101 and 107. So far, it has read the rows with the keys 50, 60, 70, 80, 10, 20, 30, 40. At this point (before the scan is finished), another process inserts a row with the key value 65. Suppose the target page 101 is full, so it undergoes a split.

It’s important to note that in a Read Committed isolation, shared locks aren’t kept until the statement is done but rather until SQL Server finishes reading the locked resource (row or page). So, even if SQL Server decides to lock pages when reading, once the page is read, the lock is released. The split process allocates a new page (can be either before or after the page that got split in the file)—say, page 135—and moves half the rows from the source page to the new one, and inserts the new row to one of the two pages based on its key value. The layout of the data in the file is now Page#101(keys: 50, 60, 65), Page#107(keys: 10, 20, 30, 40), Page#121(keys: 90, 100, 110, 120), Page#135(keys: 70, 80). The linked list will reflect the right logical index key order, so the layout in the index is Page#107(keys: 10, 20, 30, 40)-Page#101(keys: 50, 60, 65)- Page#135(keys: 70, 80)-Page#121(keys: 90, 100, 110, 120). The scan continues reading the rest of the pages—121 and 135, reading the rows with the keys 90, 100, 110, 120, 70, and 80. You realize that the rows with the keys 70 and 80 were read twice.

Similarly, if a page that wasn’t yet reached by the scan splits, and the new page is allocated before the scan position, the scan will end up skipping or not reading the rows that moved to the new page. For these potential consistency problems, in Read Committed isolation or higher, the storage engine would opt for an index-order scan when the optimizer requests an index scan with Ordered: False. There are exceptions, which I’ll discuss later.

The reason an index-order scan won’t allow these problems is that the pages are read in logical index key order following the linked list. If a page splits after the scan visited it, both pages (the one that got split and the new one) won’t be visited again because they’re behind in logical order. If a page splits before the scan reached it, the scan will read both pages in order.

An index-order scan won’t be as efficient as an allocation- order scan because fragmentation will negatively impact an index-order scan. However, these particular consistency problems won’t happen. Other consistency problems might happen, but I’ll discuss those in a later article.

What might surprise you is that when an index scan with Ordered: False is running under Read Uncommitted, the storage engine assumes that you have no consistency expectations whatsoever but rather that you’re aiming at the best possible performance. Therefore, the engine will use an allocation-order scan and these consistency problems might happen if rows are inserted while the scan is running. There’s an exception in SQL Server 2005: If the index is small (64 pages or fewer), the storage engine will use an index-order scan because it’s more efficient than an allocation-order scan with so little data. Unfortunately, when you see an index scan in the plan with Ordered: False, there’s no way to tell in the plan whether the storage engine will, in practice, use an allocation-order scan or an index-order scan.

It should be noted that table scans against a heap don’t incur the consistency problems that this article describes for the simple fact that heaps don’t incur page splits, as I explained earlier. However, keep in mind that clustered tables have some advantages over heaps—for example, they let you control the way inserts are distributed over disk drives. Also, even when the table is organized as a heap, if there are nonclustered indexes, those can incur splits, and scans of such indexes could incur the aforementioned read-consistency problems.

Ready for Examples?
Next month, I’ll provide tangible examples with code you can run to see the problems with your own eyes, and I’ll suggest alternatives to avoid these problems. Several people are due thanks for their part in shedding light on the read-consistency problems I’ve covered in this article: Lubor Kollar, Srikumar Rangarajan, Tony Rogerson, Davide Mauri, Andrea Benedetti, Gianluca Hotz, Maciej Pilecki, and Paul Randal.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Hi Itzik,

I am suprised with the fact you find, it sounds make sense. I would like to know whether page split might happen where the primary key is an identity column such as in sales.salesorderheader.

Thank you.

Kasim Wirama

[Reply: Hi Kasim,

As I mentioned in the article, when the new row has a higher index key value than the highest existing key, SQL Server won’t split the page. Still, the fact that there are no splits in indexes on ever-increasing keys, you might still get splits in other indexes.

Itzik]

wirama@cbn.net.id

Article Rating 5 out of 5

Hi Itzik,

One more thing, having read your article, I conclude that when Ordered is true, it indicates index-order scan, and when Ordered is false, it probably indicates either index-order scan or allocation-order scan, is my conclusion correct?

Thank you.

Kasim Wirama

[Reply:

Yes, you understood correctly.

Itzik]

wirama@cbn.net.id

Article Rating 5 out of 5

Hi Itzik,

Regarding the Logical Fragmentation : it is the pages in the leaf level are organized in a doubly linked list, maintaining logical index key ordering. Besides the linked list, SQL Server also maps the leaf level’s data with IAM pages in file order. The logical order of the pages maintained by the linked list (index key order) can be different from the order of the pages in the files. The discrepancy between index order of pages and file order of pages is called logical fragmentation.

Does SQL Server maintain altogether a separate file to Map the leaf level Page with IAM Page ? As per my understanding then it should refer the physical address of the linked Page from the referencial Page (Page which holds the address of the linked Page) and put it into the file. And there is all probability for the out of order Page appearance. In that scenario how the Index defragmentation facility of SQL Server comes into play ?

I am bit of confused. Need your help to carify.

Thanks & Warm Regards, Arindam Ganguly.

arindamg

Article Rating 4 out of 5

Check out this page with Firefox 2.0; I see, under Rate this Article, the scale (left to right) 1 1 3 1 5.

If this is intentional given the article's topic, it's hilarious. I rated it a 1, and by 1, I mean 4. At least I think that's what I rated.

rick.white

Article Rating 4 out of 5

Hi rick.white, Ah, you noticed our new article rating system ;-) Actually this is a bug, and I'll report it to our Web team ASAP. (I'll let you know if it's database related :-) Thanks for pointing it out--and for being good-humored about it. Anne Grubb, Web site strategic editor, SQL Server Magazine

AnneG_editor

Article Rating 5 out of 5