SideBar    Catch That Bug! September 2006, Logical Puzzle, September 2006
DOWNLOAD THE CODE:
Download the Code 92743.zip

Subquery Using TOP with OR Logic
Listing 3 shows my first set-based solution to the customer-order task.The query joins two instances of SalesOrderHeader: one representing the current row (aliased as C) and one representing the preceding row (aliased as P). The join condition matches the Sales-OrderID from C with the SalesOrderID from P. To obtain the preceding order’s Sales-OrderID, the join condition uses the subquery at callout A in Listing 3.

The code issues this subquery against another instance of SalesOrderHeader, aliased as PS.The subquery filters the orders that have the same CustomerID as the current order and either a smaller OrderDate or the same OrderDate and a smaller Sales-OrderID than in the current order. It’s as if you were logically concatenating OrderDate and OrderID and checking that the concatenated values in PS are smaller than those in C.

All orders that the customer made before the current order in C meet the subquery conditions. The subquery returns TOP (1) SalesOrderID based on the ORDER BY list: PS.OrderDate DESC, PS.SalesOrderID DESC. This order ensures that the subquery returns the SalesOrderID value of the most recent order out of all orders that qualify.

As I mentioned earlier, the first set-based solution I come up with isn’t always the fastest solution. This solution ran in about 114 seconds on my system—nearly twice as long as the cursor solution. And the I/O cost for this query was 6,362,250 logical reads.

Now, in SSMS, examine the execution plan for this query, which should look like Figure 1. The Index Scan operator scans all rows in the covering index idx_cid_od_oid _i_td to get all current orders. For each row, a Nested Loops operator initiates an Index Seek operator against idx_cid_od_oid_i_td to fetch the SalesOrderID of the preceding order. Because the subquery uses TOP (1), this seek operator returns only one row for each invocation. Then,another Nested Loops operator initiates a Clustered Index Seek operator to fetch the preceding order’s attributes from the clustered index on SalesOrderID.

In query tuning, you typically focus on the parts of the plan estimated to be the most expensive. According to the different operators’ cost ratios in this query plan, most of the execution cost is associated with the Clustered Index Seek operator (91 percent). In this case, however, the Clustered Index Seek operation is merely a lookup of the data row from the clustered index based on the SalesOrderID of the preceding order, and you can’t avoid this operation unless you forgo this solution and develop one that uses a completely different approach. However, the costing formulas use estimates, and sometimes parts of the query processing that are expensive in practice aren’t apparent in the plan, as in this case.

Subquery Using TOP with AND Logic
Something else to notice in this execution plan is the Index Seek operation against idx_cid _od_oid_i_td, even though the optimizer estimates its cost at only 8 percent. This operation’s purpose is essentially to fulfill the subquery’s task, fetching the SalesOrderID of the preceding order for each current order. To explore this operation a little more, in the execution plan in SSMS, place your mouse pointer over the Seek operator against idx_cid _od_oid_i_td and examine the yellow tooltip box. (You can get the same information through the operator’s Properties window.) The tooltip box shows the following Seek Predicates:

Prefix: [tempdb].[Sales].
  [SalesOrderHeader].
  CustomerID =
  [tempdb].[Sales].
  [SalesOrderHeader].
  [CustomerID] as
  [C].[CustomerID]

The tooltip also shows the Predicate (aka WHERE Predicate) you see in Figure 2. SQL Server 2005 Books Online (BOL) explains the difference between the Seek Predicates and the WHERE Predicate: “The storage engine uses the index to process only those rows that satisfy the SEEK:() predicate. It optionally may include a WHERE:() predicate, which the storage engine will evaluate against all rows that satisfy the SEEK:() predicate (it does not use the indexes to do this).” In other words, SQL Server uses the Seek Predicates to traverse the index B-Tree from root to leaf level and to perform the partial ordered scan at the leaf level to find matches, then evaluates the WHERE Predicate against all matching rows that the Seek Predicates processed.

In our example, only the CustomerID match between C and PS appears in the Seek Predicates. But the solution invokes this Index Seek operation for each current order—that is, 1 million times. In addition, the Seek Predicates will find about 200 matching rows in each invocation. (I populated the large SalesOrderHeader table with 5000 customers averaging 200 orders apiece.) Therefore, if you cause the optimizer to evaluate parts of the logical expression in the Seek Predicates instead of in the WHERE Predicate, you might be able to use the index to evaluate the WHERE Predicate against fewer rows.

Prev. page     1 [2] 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE