When Is a Nonclustered Index Like a Clustered Index?
In most cases, you can easily see where a clustered index would be useful, because a clustered index controls the order of storage of the data itself. Because SQL Server stores the data in the order of the clustered index's key columns, clustered indexes can be a big help if you frequently need to sort the data. If the data is already in the right order because of a clustered index, you don't need any additional sorting during query execution. And if you use a clustered index seek to find the data, SQL Server doesn't need to perform any additional bookmark lookup because when SQL Server finds the index keys it needs, it's found the entire row with all the data columns.

Clustered indexes can be useful when you're looking for a range of data—for example, you want to find all the customers whose postal code is between 80000 and 89999. If the postal code column has a clustered index, all the appropriate rows are stored close together in the table. The clustered index can be useful even if you're interested in a large percentage of rows in the table.

One of the biggest problems in choosing the best indexes is that you can have only one clustered index on a table. You can have up to 249 nonclustered indexes, but in general, those indexes are useful only if you're going to return a tiny percentage of the rows in the table.

However, SQL Server offers one additional indexing technique that makes a nonclustered index behave like a clustered index. I'm interested in two specific benefits of clustered indexes. First, after SQL Server finds the correct index key, it doesn't have to follow any pointers for additional lookups. Second, SQL Server contiguously stores all the data that meets a certain criteria. You can get both of these benefits with nonclustered indexes if you make the nonclustered index a covering index. With a covering index, all the data necessary to satisfy the query is part of the index keys and resides in the index's leaf level. A covering index should include not only the columns you reference in the WHERE clause but all columns you reference in the SELECT list, as well as any columns in the GROUP BY or ORDER BY clauses.

For example, say you want to find all the CustomerID values that start with the letter C, so you use the following query:

-- Query 6
USE Northwind
SET STATISTICS IO ON
SET STATISTICS PROFILE ON
SELECT DISTINCT customerID FROM orders
WHERE customerID LIKE 'C%'

All you're interested in is the CustomerID data, and every CustomerID value is stored in the CustomerID index's leaf level. The CustomerID index is thus a covering index for the query. SQL Server can perform an index seek operation to find the first value that starts with C, then proceed forward though the index leaf level because all the other C values are contiguous. So, SQL Server doesn't need to access the data at all.

Even though Query 6 returns more result rows, the logical reads value indicates that this query takes even fewer reads than Query 1 because SQL Server doesn't have to access the data pages. Also, because all the CustomerID values are in order in the nonclustered index's leaf level, SQL Server can perform the DISTINCT operation quite efficiently. Every time SQL Server encounters a new value while scanning the CustomerID values that start with C, it will be the first occurrence of that value. So SQL Server can return the distinct values by returning only the new values it encounters during its scan of the leaf level.

Query 6 is an example of a covered query—one that is completely satisfied by a covering index. Queries involving simple aggregates also can frequently use covering indexes. In Query 7 below, I want to know how many rows have a CustomerID value of 'ERNSH'. Because every CustomerID value is in the index's leaf level, you can count the relevant rows right in the index and never access the data itself, the logical page accesses value is very small, and the query runs very quickly:

-- Query 7
SELECT COUNT(*) FROM orders
WHERE customerID = 'ERNSH'

If you've already built a clustered index on the table, covering indexes are a wonderful feature that can help you speed up a query that returns many rows. Next month, I'll look at how SQL Server uses statistics to determine which index is the best to use when executing a query.

End of Article

Prev. page     1 2 3 [4]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE