Similar but Different
All three of the previous queries returned the same row of results. The following query, involving an equality comparison, looks similar to Query 1. However, if you run it with the two SET options on, you'll notice significantly different behavior:

-- Query 4
USE Northwind
SET STATISTICS IO ON
SET STATISTICS PROFILE ON
SELECT * FROM orders
WHERE customerID = 'ERNSH'

For this query, 30 rows in the orders table satisfy the condition in the WHERE clause, and the number of logical reads is 21, the same as in Query 2. The optimizer decided to scan the clustered index in this case. If you think that the optimizer should have used the nonclustered index, you can try overriding the optimizer by using an optimizer hint. The word "hint" is misleading when applied to SQL Server queries. Usually, a hint is a gentle suggestion of something to consider; however, a SQL Server query hint is actually a directive. When you use a hint in a query, the optimizer has no choice but to comply. I recommend using hints for testing purposes only. The main reason for using them is just what I'm doing here—verifying that SQL Server's optimizer did indeed make the right choice.

Here's Query 4 rewritten, this time using a hint to force SQL Server to use the index that it used in Query 1:

-- Query 5
SELECT * FROM orders (index = CustomerID)
WHERE customerID = 'ERNSH'

The execution plan shows that SQL Server obeyed the hint and used the nonclustered index on CustomerID, but the logical reads value of 72 shows that this solution wasn't the best one. A nonclustered index has only a pointer to a data row in its leaf level; for each data value that meets the condition in the WHERE clause, SQL Server must follow that pointer to the table itself. The execution plan output refers to this operation as the bookmark lookup operation (the pointers in the leaf level of the nonclustered index are the bookmarks).

The optimizer's job is to decide whether following all the bookmarks for multiple rows or just scanning every page in the table will be more efficient. In this example, following all the bookmarks from the nonclustered index to the data cost 72 page scans, whereas scanning the whole table cost only 21. Apparently, the optimizer made the right decision. In general, using a nonclustered index is useful only if the query will return a very small percentage of rows. I know of no absolute rule about what percentage of rows causes the optimizer to choose a table or clustered index scan instead of seeking through a nonclustered index. Query 4 returns 30 rows, which is about 3 percent of the 830 rows in the table. If I run a similar query, looking for the name 'WELLI' instead of 'ERNSH', the nine rows returned are about 1 percent of the table and the optimizer chooses the nonclustered index seek. I estimate that the cutoff is at about 1 percent of the rows, but it depends on how many rows are on each page and the size of the index key compared with the size of the full rows. You can get a better feel for this cutoff point by running some tests to determine how often, and for what kinds of values, the optimizer chooses a nonclustered index.

Prev. page     1 2 [3] 4     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE