Subquery Using TOP with OR Logic
Listing 3 shows my first set-based solution to the customer-order task.The query
joins two instances of SalesOrderHeader: one representing the current row (aliased
as C) and one representing the preceding row (aliased as P). The join condition
matches the Sales-OrderID from C with the SalesOrderID from P. To obtain the
preceding order’s Sales-OrderID, the join condition uses the subquery
at callout A in Listing 3.
The code issues this subquery against another instance of SalesOrderHeader,
aliased as PS.The subquery filters the orders that have the same CustomerID
as the current order and either a smaller OrderDate or the same OrderDate and
a smaller Sales-OrderID than in the current order. It’s as if you were
logically concatenating OrderDate and OrderID and checking that the concatenated
values in PS are smaller than those in C.
All orders that the customer made before the current order in C meet the subquery
conditions. The subquery returns TOP (1) SalesOrderID based on the ORDER BY
list: PS.OrderDate DESC, PS.SalesOrderID DESC. This order ensures that the subquery
returns the SalesOrderID value of the most recent order out of all orders that
qualify.
As I mentioned earlier, the first set-based solution I come up with isn’t
always the fastest solution. This solution ran in about 114 seconds on my system—nearly
twice as long as the cursor solution. And the I/O cost for this query was 6,362,250
logical reads.
Now, in SSMS, examine the execution plan for this query, which should look
like Figure 1. The Index Scan operator
scans all rows in the covering index idx_cid_od_oid _i_td to get all current
orders. For each row, a Nested Loops operator initiates an Index Seek operator
against idx_cid_od_oid_i_td to fetch the SalesOrderID of the preceding order.
Because the subquery uses TOP (1), this seek operator returns only one row for
each invocation. Then,another Nested Loops operator initiates a Clustered Index
Seek operator to fetch the preceding order’s attributes from the clustered
index on SalesOrderID.
In query tuning, you typically focus on the parts of the plan estimated to
be the most expensive. According to the different operators’ cost ratios
in this query plan, most of the execution cost is associated with the Clustered
Index Seek operator (91 percent). In this case, however, the Clustered Index
Seek operation is merely a lookup of the data row from the clustered index based
on the SalesOrderID of the preceding order, and you can’t avoid this operation
unless you forgo this solution and develop one that uses a completely different
approach. However, the costing formulas use estimates, and sometimes parts of
the query processing that are expensive in practice aren’t apparent in
the plan, as in this case.
Subquery Using TOP with AND Logic
Something else to notice in this execution plan is the Index Seek operation
against idx_cid _od_oid_i_td, even though the optimizer estimates its cost at
only 8 percent. This operation’s purpose is essentially to fulfill the
subquery’s task, fetching the SalesOrderID of the preceding order for
each current order. To explore this operation a little more, in the execution
plan in SSMS, place your mouse pointer over the Seek operator against idx_cid
_od_oid_i_td and examine the yellow tooltip box. (You can get the same information
through the operator’s Properties window.) The tooltip box shows the following
Seek Predicates:
Prefix: [tempdb].[Sales].
[SalesOrderHeader].
CustomerID =
[tempdb].[Sales].
[SalesOrderHeader].
[CustomerID] as
[C].[CustomerID]
The tooltip also shows the Predicate (aka WHERE Predicate) you see in Figure
2. SQL Server 2005 Books Online (BOL) explains the difference between the Seek
Predicates and the WHERE Predicate: “The storage engine uses the index
to process only those rows that satisfy the SEEK:() predicate. It optionally
may include a WHERE:() predicate, which the storage engine will evaluate against
all rows that satisfy the SEEK:() predicate (it does not use the indexes to
do this).” In other words, SQL Server uses the Seek Predicates to traverse
the index B-Tree from root to leaf level and to perform the partial ordered
scan at the leaf level to find matches, then evaluates the WHERE Predicate against
all matching rows that the Seek Predicates processed.
In our example, only the CustomerID match between C and PS appears in the Seek
Predicates. But the solution invokes this Index Seek operation for each current
order—that is, 1 million times. In addition, the Seek Predicates will
find about 200 matching rows in each invocation. (I populated the large SalesOrderHeader
table with 5000 customers averaging 200 orders apiece.) Therefore, if you cause
the optimizer to evaluate parts of the logical expression in the Seek Predicates
instead of in the WHERE Predicate, you might be able to use the index to evaluate
the WHERE Predicate against fewer rows.
Prev. page
1
[2]
3
next page