DOWNLOAD THE CODE:
Download the Code 42801.zip

Last month, in "Inside Search Arguments" (InstantDoc ID 42349), I discussed specific types of search arguments—which I call SARGs—that are most beneficial to the query optimizer when it's determining the best execution plan for a query. In most of the examples I used, if the query didn't contain a SARG that included a specific column, SQL Server wouldn't consider any indexes on that column for the query plan.

Even if the query contains a valid SARG, the optimizer sometimes can't determine the best plan because it can't use the statistics-histogram information during optimization. This can happen when the value the query is comparing to the column is an expression that the optimizer can't evaluate. In these cases, SQL Server has to wait for the execution phase before it can resolve the expression's value.

A common situation in which the optimizer can't use statistics occurs when you're comparing a column to a variable. This month, I look at how the optimizer works with variables. In addition, I examine another SARG construct—stored-procedure parameters—that looks very much like using variables. We'll see how differently the optimizer treats variables and parameters.

Getting Started
To run this month's sample code, you need the same modified Orders table that we used last month. The code in Listing 1 recreates the NewOrders table and builds two unique indexes: one on the Number column and one on the OrderID column. After running Listing 1's code, use the following code to build one more index, a nonunique index on the CustomerID column:

CREATE INDEX customerID_index ON NewOrders(CustomerID)

Before we look at the effect of variables on optimization, consider the following two queries on the NewOrders table:

SELECT * FROM NewOrders WHERE CustomerID = 'DUMON'
SELECT * FROM NewOrders WHERE CustomerID = 'OCEAN'

The first query returns 20 rows, and the second returns 25 rows. Looking at the query plans for each, you'll see that the first plan uses a seek on the CustomerID index and the second uses a table scan. So the optimizer decided that a result somewhere between 20 and 25 rows was too large for effective index use and that scanning the table would be more efficient.

Besides checking the query plan, you can compare the difference between these two queries by looking at the number of logical reads each required. Turn on the STATISTICS IO option, then execute the preceding two queries again. You should see that the query using the index seek took 22 logical reads and the one using the table scan took 107 reads—the number of pages in the table. If you force the optimizer to use the index on CustomerID, you might find that the resulting plan requires fewer reads—in this case, 27:

SELECT * FROM NewOrders WITH
(INDEX =  customerID_index)
   WHERE CustomerID = 'OCEAN'

Keep in mind that the number of page accesses isn't the only cost affecting the optimizer's choices. With an index seek, the data access is random, but a table scan can employ an extent-by-extent sequential read of the table. This type of scan is frequently quite efficient. If you force the optimizer to use the index on the following example, which uses a different search value, you can see how bad an idea overriding the optimizer can be:

SELECT * FROM NewOrders WHERE
   CustomerID = 'SAVEA'
SELECT * FROM NewOrders(INDEX =
   customerID_index) WHERE
   CustomerID = 'SAVEA'

When you don't force the optimizer to use the index, a search for the CustomerID value SAVEA takes 107 reads, but when you force the index, SQL Server needs 157 reads.

Variable Processing
Now let's look at how the query optimizer processes variables. Consider the following batch:

SET STATISTICS IO ON
DECLARE @name nchar(10)
SET @name = 'DUMON'
SELECT * FROM NewOrders
  WHERE CustomerID = @name

The query plan shows that the optimizer chose a table scan. If you run the query and look at the statistics, you'll see 107 reads. Although you have a valid SARG in the SELECT statement, when the optimizer evaluates the query, it hasn't yet executed the SET statement that assigns the specific value to the variable. The optimizer optimizes the entire batch at once. You can't put the variable assignment in a separate batch because the scope of a local variable is one batch. Because the optimizer doesn't know what value to look for in the index-statistics histogram, it has to guess. In this case, the estimate is too large for the index to be useful.

In general, the optimizer uses certain rules to produce an estimate when it can't use the index statistics. If the SARG contains an open range (it's looking for the column to be greater than or less than an unknown value), the optimizer estimates that 30 percent of the rows in the table will satisfy the condition. If the SARG contains a closed range (a BETWEEN clause or two overlapping AND conditions), the optimizer guesses that 25 percent of the rows in the table will satisfy the condition. If the SARG contains an equality comparison, the optimizer tries to use the density information in the index statistics; if that's impossible, it guesses that 10 percent of the rows in the table will satisfy the condition.

In most of these cases, the rule-based estimate is too large for the optimizer to consider using an index. In my experience, the optimizer usually doesn't consider using a nonclustered index unless it estimates that only a tiny percentage of the rows in the table will need accessing. That number is usually around 1 or 2 percent, but it can be even smaller.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I've noticed examples where the optimizer will choose a NC bookmark lookup on a table with a clustered index (i.e. 7 logical IOs) when there are only 19 records and a single page in the table itself. Is this an example of the same behaviour you describe in 'Getting Started'? It seems a bit brutal to me when the table only has 1 page. When I used to work with Sybase, I would put good money on the fact that it would never use an index on a table with 1 page in it.

Philip Yale

Great Article. Why I subscribe. Answered questions books don't get into.

big fish

Article Rating 5 out of 5