• subscribe
October 21, 2003 12:00 AM

Inside Optimizer Enhancements

More work for the optimizer means less work for you
SQL Server Pro
InstantDoc ID #39906
Downloads
39906.zip

For SQL Server to be able to join tables, they must have some field in common. As you know, besides holding an index key value, each row in a nonclustered index's leaf level has a pointer (aka a bookmark) telling where to find the data row. So, SQL Server can "join" the two worktables on this bookmark. In fact, if you look at the graphical showplan or enable SET SHOWPLAN_TEXT ON before running the query, you'll see the following details of the join:

|—Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000])

Microsoft developers were able to add the technique of using multiple indexes on one table to SQL Server 7.0 partly because of another new ability, which lets SQL Server process JOIN operations in several different ways. This new ability, the Hash Match operator for joins, was also new in SQL Server 7.0; I'll discuss hashing next month.

If a nonclustered index supports more than two search arguments, the query optimizer can decide to use more than two indexes on the same table. In the small, simple Northwind database, it's hard to find an example of a query that uses this kind of a plan, but you might see such a plan in more-complex production databases.

A Little Knowledge
The second new query-processing technique that the optimizer uses is one that I mentioned last time—it took me by surprise when I noticed it. In "Time for a Tune-Up," August 2001 (InstantDoc ID 21480), and "The Big Cover-Up," September 2001 (InstantDoc ID 21729), I discussed a situation in which all the columns a query needs are part of a nonclustered index's keys. Such a query is called a covered query, and the index is a covering index. In those articles, I showed this example:

USE Northwind
SELECT DISTINCT customerID 
FROM orders
WHERE customerID LIKE 'C%'

The Northwind database has a nonclustered index on customerID, so every customerID value is in that index's leaf level. The optimizer can determine that scanning the nonclustered index's leaf level is much more efficient than scanning the entire table because the leaf level of a nonclustered index has substantially fewer pages than the table.

The ability to use a covering index has been available since the first version of SQL Server. However, through SQL Server 7.0 Service Pack 1 (SP1), the only time SQL Server would scan a nonclustered index's leaf level was when the index was also a covering index. Before SQL Server 7.0 SP2, a query plan would never show that SQL Server was performing both a nonclustered index scan and a bookmark-lookup operation. As I mentioned last month, using the bookmarks at the leaf level to access many data pages after examining many leaf-level nonclustered index rows soon becomes more expensive than scanning the entire table.

But what if SQL Server needs to access only a few data pages? SQL Server 7.0 SP2 introduced a new query-processing technique that takes advantage of statistics on nonindexed columns. To use this technique, you need a table that's larger than the original 2155 rows. Listing 3's code copies the Northwind Order Details table into itself three times, resulting in an OrderDetails table of 8620 rows and 35 data pages. The code also changes the UnitPrice of one row, then builds a composite nonclustered index on OrderID and UnitPrice.

Now copy this SELECT statement into Query Analyzer and look at the query plan:

SELECT *
FROM OrderDetails
WHERE UnitPrice = 1.99

The optimizer scans the leaf level of the composite index, then performs a bookmark-lookup operation. The query is looking for rows with a UnitPrice value of 1.99, and only one such row exists. However, because UnitPrice isn't the leftmost column of the index, SQL Server can't perform an index seek to access the row for that unit price directly in the nonclustered index. Before SQL Server 7.0 SP2, the only alternative strategy is to scan the entire table. But in later releases, the query optimizer can use the statistics on the UnitPrice column to determine that only a few rows meet the search condition—in this case, exactly one row. The chosen plan—which scans the nonclustered index leaf level, then accesses the data pages by using the bookmark-lookup operation for one row—is cheaper than scanning the entire table.If you turn on STATISTICS IO, you'll see that this plan needs only 28 page accesses, whereas a table scan would have to read all 35 pages in the table. This difference might not seem substantial, but for tables of millions or hundreds of millions of rows, the difference could be huge and noticeably affect performance.

Keep in mind that you don't need to do anything for SQL Server to take advantage of these enhancements. They're just a few examples of how Microsoft continues to improve SQL Server.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here