Last month, I told you about changes to the query optimizer's behavior in SQL Server 2000 and 7.0 and the huge increase in the optimizer's complexity. The optimizer can evaluate so many more possible plans than in previous releases that there's no guarantee that the optimizer will always find the best plan; the optimizer's goal is to find a plan that's "good enough." This month, let's look at a couple of the new query-processing techniques that the optimizer can include in a query plan.
Will You Join Me?
Microsoft added the first new technique to the SQL Server 7.0 initial release. Before SQL Server 7.0, a general rule was that SQL Server could use only one index per table for any given query (with one exception, which I'll tell you about in a later article). The optimizer's job was to find which index SQL Server could traverse during query execution to find the desired result rows with the fewest page accesses. After comparing the cost (number of page accesses) to the number of pages in the table, the optimizer might decide that scanning the whole table would be cheaper than using any of the available indexes.
Because of changes in the way that SQL Server stores indexes internally, the optimizer in SQL Server 2000 and 7.0 checks the possibility of using more than one index on the same table to process a query. Let's look at an example, then I'll tell you some details about the new processing techniques. The script in Listing 1 creates a copy of the Northwind database's Order Details table, which I call OrderDetails (no space in the name). The script then builds two nonclustered indexes on Order-Details.
Each of the queries in Listing 2, page 16, contains a search expression for a key column in one of the indexes. The first query has a search argument on the Quantity column, and the second query has a search argument on ProductID. If you execute Listing 1's script, then look at the query plans for the two queries in Listing 2, you'll see that neither of the plans uses the nonclustered index on the column in the WHERE clause; both query plans use a table scan.
If you run the command SET STATISTICS IO ON, then execute Listing 2's queries, you'll see that the queries each take 10 logical readsone for each page in the table. The first query returns 58 rows. If the optimizer had decided to use the nonclustered index on Quantity, SQL Server would have had to perform 58 bookmark lookup operations, a much higher cost than the 10 logical reads of the table scan. The second query returns 33 rows, so it, too, would have cost more than 10 logical reads if the optimizer had decided to access the nonclustered index and perform bookmark lookups.
Now look at the following query, which combines the two search arguments from Listing 2's queries:
SELECT *
FROM OrderDetails
WHERE ProductID = 10
AND Quantity = 24
This query returns three rows at a cost of seven logical reads, which is less than the cost of a table scan. The query plan looks like SQL Server is performing a JOIN operation, even though it's accessing only one table. SQL Server performs this SELECT operation by first using the index on ProductID to find all the rows that meet the first condition and storing information from the ProductID index in an internal worktable. SQL Server uses the index on Quantity to find all the rows that meet the second condition, then stores information from that index in another worktable. SQL Server needs to join the two worktables to find the rows that exist in both worktables, which is why a JOIN operator appears in the query plan.