You might not realize that in a SQL Server 2005 query plan, the use of a nonclustered index on a table shows a join operation. Almost all the SQL Server Books Online (BOL) descriptions that discuss the JOIN keyword in a query refer to joining
two tables. However, in the BOL sections on types of joins under “query tuning,” you’ll
see references to the join operator in a query plan and a description that refers to two
inputs, not two tables. For our purposes, we can consider an input to a join to be a set
of rows; it could be an entire table or a set of rows after specified filters are applied.
We’ll examine inputs to joins that are sets of index rows and learn about two types
of query plans that indicate that SQL Server is performing
a join, although only one table is in the query and no JOIN
operator is specified in the T-SQL code.
Query Plan Using a Nonclustered Index
In the first type of query, the plan uses a nonclustered index to find the desired rows—that
is, rows meeting a condition specified in the WHERE clause. You probably know that the
index rows in the leaf level of a nonclustered index contain the index key columns and a
bookmark, or pointer, to where the actual corresponding data row is located. As I discuss
in “Inside Optimization,” October 2003, InstantDoc ID 39822, a bookmark can take one
of two forms. If the underlying table has a clustered index, the bookmark is the clustered
index key. If the table is a heap, the bookmark is a row ID, indicating the file number, page
number, and slot number on the page where the row is located.
In SQL Server 2000, the query plan for a single-table query that uses a nonclustered
index to find the data rows shows the bookmark lookup operation. The execution plan
looks the same whether or not the table has a clustered index. To see the plan, first run
the code in Listing 1 on a SQL Server 2000 instance to create a copy of the Orders
table in the Northwind database, called Orders2.
The only difference between the two tables is that the original Orders table has
a clustered index on OrderID and the new Orders2 table is a heap (i.e., a table with no clustered index). I created the two queries in Listing 2, then used SQL
Server Management Studio (SSMS) to run these queries against a SQL Server2000 instance. The execution plans for the two queries look the same. The plan shows that SQL Server uses the non-clustered index to find the index rows
meeting the desired condition (in this case,
CustomerID = 'VINET'), then uses the
bookmarks in the leaf-level index row to
find the corresponding rows in the table.
In SQL Server 2005, the plan looks
different. If you think about it, the bookmark lookup operation actually takes a set
of rows from the index (all rows meeting
the specified condition) and finds all rows
in the underlying table that match those
rows (i.e., that have rows with the same
bookmark). These actions are the same
as a join operation, so SQL Server 2005’s
query plans show this operation as a join.
SQL Server 2005 isn’t really doing anything differently; it’s just that the query
plan describes SQL Server’s execution
steps slightly more precisely.
To see a query plan containing an
index seek of a nonclustered index in
SQL Server 2005, use the Sales.Sales-OrderHeader table in the Adventure-Works database, run the query in Listing 3,
and examine its query plan, which Figure
1, shows. The index seek at the
top right is the seek from the index on
the SalesPersonID column, which returns
a set of index rows, each containing a
clustered index key as a bookmark. The
plan shows a nested-loop join that takes each bookmark (clustered index key) from
the rowset provided from the index on
SalesPersonID and finds the matching rows
in the table, by seeking on the clustered
index.
If Sales.SalesOrderHeader didn’t have a
clustered index, the plan would look similar
but not exactly the same. The plan still shows a nested-loop
join operation, but instead of the seek in the clustered index, the plan shows a row ID (RID)lookup operation into the base table because the bookmark returned from the nonclustered index is the row ID.
Query Plan Using Index Intersection
The second type of query plan
that can show a join operator
even when the query contains
one table occurs when the query optimizer decides to use two or more indexes
on a table. (I describe this behavior—
called index intersection—in “Inside Optimizer Enhancements,” November 2003,
InstantDoc ID 39906.) If a query has multiple search conditions, and more than one
has a potentially useful nonclustered index,
the optimizer considers index intersection.
Consider the query in Listing 4. SalesPersonID and CustomerID each have a non-clustered index. If either search condition in
the query were selective and returned only a
few rows, the optimizer could simply decide
to use one index. The optimizer would use
a query plan like that in Figure 1 to find
the rows in the underlying table based on
one of the conditions. Then, once the optimizer had the rows from the table, it would filter them
according to
the second
condition.
However, if
neither condition were selective, the optimizer might decide to use
both indexes. The qualifying index rows
from each index contain the nonclustered
keys and bookmarks, and SQL Server could
join the two rowsets together, matching
index rows with the same bookmark.
In the query in Listing 4, neither condition is selective enough to use a nonclustered index. If you look at the plans for each
query in Listing 5, you’ll see that each uses a
clustered index scan, which is really the same
as a table scan.
However, if you combine the two queries, you’ll get a plan like the one in Figure 2. SQL Server performs index seeks by
using each nonclustered index, then sorts
the rowset of index rows from the index
on CustomerID by SalesOrderID, which is
the clustered key. The index rows from the
SalesPersonID index are already sorted by
SalesOrderID, because the index’s leading
column has only one value (278). After
sorting the rows from the CustomerID
index by SalesOrderID, SQL Server can
perform a merge join. The result of the
merge join is still just a set of index keys
and bookmarks, so another join operation
combines that rowset with the rows in the
clustered index by using a nested-loop join.
If the underlying table were a heap, the plan
would look similar, but the base table would use a RID lookup instead of a clustered
index seek.
Theoretically, the optimizer could choose
to use index intersection on more than two
nonclustered indexes on the same table.
However, in addition to the cost of seeking
through the index to find the relevant index
rows, the optimizer must consider the added
cost of managing the rowsets from each index
and possibly performing sort operations and
joining the rowsets together. Performing these
actions is usually much more expensive than
using a single index, joining to the base table
to get the data rows, then filtering the rows to
apply the additional search conditions.
An Obscure Join
Before SQL Server 7.0, SQL Server used at
most one index per table in a query. Index
intersection, introduced in SQL Server 7.0,
lets SQL Server join sets of rows retrieved
from multiple nonclustered indexes. SQL
Server 2005’s query plans show us that using
the index information from a nonclustered
index to find rows in the underlying table is
really a type of joining—between an index
and a table. There are other ways of using
multiple indexes on the same table in a
single query, and I’ll explain these options
in an upcoming column.
End of Article