DOWNLOAD THE CODE:
Download the Code 22075.zip

A few months ago, I realized another benefit of having statistics on a column. This capability didn't exist until SQL Server 7.0 Service Pack 2 (SP2); it also exists in SQL Server 2000. To understand this use of column statistics, keep in mind the principle behind covered queries, which I discussed in "The Big Cover-Up." The leaf level of a nonclustered index contains every key value. In general, SQL Server uses a nonclustered leaf-level scan only when the index completely covers the query so that SQL Server doesn't need to access the data pages. Also remember that if an index is composite, the index keys in the nonclustered index's leaf level contain all the columns that make up the index.

SQL Server sorts the data in any index by the leftmost indexed column. So in general, if you don't have a search argument on the leftmost column, an index isn't useful. Index statistics just store the histogram for the first column. Therefore, an index on ProductID and Quantity in the Order Details table in the Northwind database would have a histogram of the distribution for only the ProductID column. If you want statistics on the Quantity column, you must either create a separate index on that column or create column statistics.

What if you run a query looking for a unique value in the Quantity column? Because Quantity is the second column of the composite index, SQL Server can't use this index to help find the one satisfying row. However, if you had statistics on the Quantity column, the optimizer would know how many times the specific value of Quantity occurred. The optimizer could then choose to scan the nonclustered index's leaf level, which contains all the value combinations for ProductID and Quantity, to find the pointer to the row containing the Quantity value you're looking for.

To see when the optimizer uses this technique, let's look at a table larger than anything in the sample Northwind database. Even with 2155 rows, the Order Details table is only slightly larger than one extent, and the optimizer usually doesn't decide to do anything complicated for tables that small. So, let's make a copy of Order Details and increase its size tenfold. Then, we'll update some rows so that the number of rows with the desired value is very small.

Listing 1 shows the script that you can use to generate the data for the test. After you run the code in Listing 1, only one row with a Quantity value of 54 exists in the test table, OrderDetails. But no index on the table has the Quantity column as the leftmost column. I want SQL Server to realize that only one row with this value exists. Then, I want the optimizer to decide that scanning the nonclustered index, which contains all the combinations of ProductID and Quantity values, will find that row more efficiently than a complete table scan can find it. Although SQL Server will then need to follow the bookmark from the nonclustered index to the data row containing Quantity 54, that process will still be more efficient than a table scan.

You can run the following query to see that only one row has the value 54:

-- Query 2:
SET STATISTICS IO ON
SELECT * FROM OrderDetails
WHERE Quantity = 54

You can also run sp_helpstats on the OrderDetails table (or sp_helpindex if you're using SQL Server 7.0) to see that statistics exist on the Quantity column.

However, the query plan for Query 2 shows that SQL Server didn't use the new trick I described; it just scanned the table. The estimated number of rows shows that the statistics haven't been updated since I ran the UPDATE statement from Listing 1. SQL Server still thinks that the table contains 10 rows that have a Quantity value of 54. But if I force SQL Server to update the statistics, the optimizer will choose the index scan:

UPDATE STATISTICS OrderDetails
GO
SET SHOWPLAN_TEXT ON
GO
SELECT * FROM OrderDetails
WHERE Quantity = 54

Figure 1 contains the output of the query plan after the statistics were updated.

Note that the revised plan involves both a nonclustered-index scan and a bookmark lookup. Before SQL Server 7.0 SP2, this combination of steps rarely occurred. SQL Server usually scanned a nonclustered index because it was a covering index, so it didn't need to perform bookmark lookups to access the data pages. But the SQL Server query optimizer is a work in progress. The development team at Microsoft is constantly evaluating query performance and looking for ways to make the optimizer even smarter. Microsoft doesn't always document the changes because making performance enhancements isn't like fixing bugs. Getting a new-and-improved optimizer is just a fringe benefit of upgrading, either to the next service pack or to the next release of SQL Server.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

It explained an issue, which has puzzled me for some time (the concrete use of auto-created statistics).

Jesper Jensen

 
 

ADS BY GOOGLE