DOWNLOAD THE CODE:
Download the Code 22927.zip

Use indexes to retrieve sorted data more efficiently

Learning how SQL Server indexes are physically organized and how they can help speed up data access in a table is the most important step in understanding query tuning. By adding a good index to a large table that previously had no useful index for a query, you can obtain one of the largest gains possible for any type of SQL Server tuning. If a large table, with tens of thousands of pages or more, has no useful index, SQL Server must scan the entire table to find the data your query needs, even if you're interested in only one row.

Until SQL Server finishes searching the entire table, it doesn't know how many rows qualify. With a useful index, SQL Server might need to access only a handful of pages as it traverses each index level. The difference between accessing tens of thousands of pages and accessing only a handful of pages makes indexes a powerful tuning tool.

But even when you're concerned with data in just one table, you need to address queries other than those that search for rows meeting a certain condition. Sometimes, queries involve organizing the data in a table. Queries involving the GROUP BY, DISTINCT, ORDER BY, and TOP clauses fall into this category. Technically, ORDER BY and TOP apply to the resultset a query generates, not to the table data itself; however, for query-tuning purposes, that distinction isn't relevant.

Let's look first at ORDER BY, which requests data from SQL Server in a particular sorted order. One reason to look at ORDER BY first is that you can apply the same principles that let you tune queries that involve sorting to queries that involve the other organizational clauses I mentioned: GROUP BY, DISTINCT, and TOP. Although ORDER BY applies to a SELECT statement's resultset, SQL Server can sometimes use indexes on the data to help it retrieve the rows in the order you want them so that it doesn't need to further sort the resultset.

Presorting the Data
One obvious way that you can use indexes to make retrieving sorted data more efficient is to create a clustered index on the column that you're going to order by. When you create a clustered index, SQL Server reorganizes the data pages so that the rows are logically stored in clustered-index order. SQL Server doesn't necessarily store the data physically on the disk in clustered-index order, but while creating an index, SQL Server attempts to physically order the data as close to the logical order as possible. Each page in an index's leaf level has a pointer to the page that logically precedes the current page and to the page that logically follows the current page, thereby creating a doubly linked list. The sysindexes table contains the address of the first leaf-level page. Because the data is guaranteed to be logically in clustered-index order, SQL Server can just start at the first page and follow the index pointers from one page to the next to retrieve the data in order.

Now let's look at a query plan involving a sort on a clustered-index column. In the Northwind database, the Customers table has a clustered index on the CustomerID column. (SQL Server automatically created the index when the primary key was defined on the CustomerID column.) If I select all the rows from the Customers table and sort by the CustomerID column, I might expect SQL Server to use the clustered index to retrieve the rows already sorted in the correct order. After running SET SHOWPLAN_TEXT ON, run the following query:

--Query 1:
SELECT * FROM Customers
ORDER BY CustomerID

Here's the SHOWPLAN output, which tells you that SQL Server scanned the clustered index to retrieve the data, just as you'd expect:

StmtText
-----------------------------------------  
|--Clustered Index Scan(OBJECT:
([Northwind].[dbo].[Customers].
[PK_Customers]), ORDERED FORWARD)

For comparison, let's execute Query 2, which requests the data sorted by a column that doesn't have an index—Country:

--Query 2:
SELECT * FROM Customers
ORDER BY Country

The SHOWPLAN output includes the extra step of sorting the data, so you can see that the query is more efficient if a clustered index already exists on the sort column:

StmtText
--------------------------------------------
|--Sort(ORDER BY:([Customers].[Country] ASC))
    |--Clustered Index Scan(OBJECT:([Northwind]
.[dbo].[Customers].[PK_Customers]))

Note that the query plan for the first query includes the ORDERED FORWARD option, indicating that the index must be scanned in a particular logical order. The second query plan doesn't include that option; the order in which SQL Server scans the clustered index is irrelevant. The only important occurrence is that all the key values are accessed so that they can then be sorted.

Turn It Around
You're probably aware that T-SQL's ORDER BY clause lets you request that the data be returned sorted in descending order, as Query 3 illustrates:

--Query 3:
SELECT * FROM Customers
ORDER BY CustomerID DESC

SQL Server can use the same clustered index to retrieve the data in both ascending and descending order: Because the pages at the leaf level are stored in a doubly linked list, SQL Server can simply traverse the pages from last to first. The SHOWPLAN output for Query 3 is almost identical to the plan for Query 1:

StmtText
--------------------------------------------
  |--Clustered Index Scan(OBJECT:([Northwind]
.[dbo].[Customers].[PK_Customers]), ORDERED BACKWARD)

The only difference is that the clustered index scan for Query 3 is a BACKWARD scan.

   Prev. page   [1] 2 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Thanks to your article I have solved a problem which you didn't describe. The following query takes only a few ms : select * from table1 (NOLOCK) where c1>'20021115' and c2 like '9812814425722'+'%' AND NOT EXISTS(SELECT * FROM table2 (NOLOCK) WHERE table2.ID=table1.ID AND table2.c3 LIKE '[0-9]%')

When I replace the constant value with a variable, then it takes more than 40 seconds! Here's my query : declare @d varchar(50) select @d='9812814425722' select * from table1 (NOLOCK) where c1>'20021115' and c2 like @d+'%' AND NOT EXISTS(SELECT * FROM table2 (NOLOCK) WHERE table2.ID=table1.ID AND table2.c3 LIKE '[0-9]%')

I have added the hint FASTFIRSTROW into the 'EXISTS'-query, and now it runs within a few ms.

Is it possible the hint FASTFIRSTROW is also a great solution into select MAX and TOP 1 and EXISTS querys?

tnx,

Eric Dewilde.

Eric Dewilde