DOWNLOAD THE CODE:
Download the Code 93099.zip

Last month, I explored three solutions to a common T-SQL problem: matching rows based on their location in a sequence. The solutions addressed a classic example of such a problem, matching each customer's current and preceding orders and returning various calculations that you can use to compare measures and analyze trends. But if you use some of the new T-SQL features in SQL Server 2005, you can write simpler and faster solutions.

Last month, I compared the performance and I/O costs of a cursor solution and two set-based solutions.The cursor-based solution ran for 58 seconds, the scalar subquery using TOP with OR logic ran for 114 seconds, and the scalar subquery using TOP with AND logic ran for 17 seconds. (See "Matching Current and Preceding Rows,"http://www.sqlmag.com, Instant-Doc ID 92743, for details about those solutions.) This month, I present three additional solutions: one that uses the new APPLY operator and two that use the ROW_NUMBER function.

Refresher Course
To demonstrate this and last month's solutions, I use the SalesOrderHeader table in the AdventureWorks database.

The following query returns the columns of interest for our problem, as Table 1 shows in abbreviated form:

SELECT 
 CustomerID AS CustID,
 CONVERT(CHAR(8), OrderDate,
 112) AS OrderDate,
 SalesOrderID AS OrderID,
 TotalDue 
FROM Sales.SalesOrderHeader 
ORDER BY CustID, OrderDate, 
  OrderID; 

Your task is to write a query against the SalesOrderHeader table that performs the calculations needed to return the output that Table 2 shows (in abbreviated form).The table shows current customer ID, current order date, current sales order ID, and current total due. It also includes the elapsed time in days between the order dates for the customer's preceding and current orders ("Elapsed" column), the difference in total due for the current and preceding orders ("Diff" column), and the percentage difference in total due for the current and preceding orders ("Pct" column). Sorting by OrderDate, then OrderID determines precedence among each customer's orders.

Because the SalesOrderHeader table in AdventureWorks is too small for performance testing, run the code in Web Listing 1, to create in tempdb a version of SalesOrderHeader with 1 million rows. All the performance measures I present are based on the larger table in tempdb and don't include the time it takes to generate the output in SQL Server Management Studio (SSMS).To exclude SSMS output time, navigate to Tools, Options, Query Results, SQL Server, Results to Grid (or Results to Text), and enable the Discard results after execution option. Because of differences between your system and my test system, you might get slightly different runtimes than I did.

APPLY Solution
The fastest solution last month was based on a scalar subquery using TOP with AND logic. It ran for 17 seconds on my test system and incurred 5,992,708 logical reads. Listing 1 shows that solution's FROM clause, which matches current and preceding rows.

In this FROM clause, C represents the current row and P represents the row preceding the current row. The subquery in the FROM clause returns the OrderID for the order that precedes the one in instance C. The clause then matches each row in C with a row from P that has the OrderID returned by the scalar subquery.

The first solution this month follows similar logic to return, for each current order, the customer's preceding order.

However, instead of using a scalar subquery to return only the OrderID from the preceding order, I use the APPLY operator and a table subquery (aka derived table) to return all required attributes from the preceding order.

Listing 2 shows the solution's query.You specify the new APPLY relational operator in the FROM clause to invoke a table-valued function for each row of an outer table. APPLY comes in two versions: CROSS APPLY, which doesn't return the outer table's row if the table-valued function returns an empty set, and OUTER APPLY, which returns a row with NULLs instead of the function's columns. Listing 2 uses OUTER APPLY.

The execution plan produced for Listing 2's query is nearly identical to the one I showed last month for the scalar subquery solution. There is, however, one important difference: No clustered index seek operations are needed to look up the data row. In last month's solution, the scalar subquery forced a seek operation on the idx_cid_od_oid_i_td index for each order to return the OrderID for the preceding order. But the scalar subquery also still required a clustered index seek operation per order to look up the rest of the necessary attributes from the data row.

By using the APPLY operator, the derived table query causes a seek operation on the index for each order. However, this seek operation returns all the required attributes from the preceding order. No additional clustered index seek is required. Thus, this query ran for about 13 seconds on my test system and incurred 3,195,230 logical reads—about half the I/O required by the scalar subquery solution. (Note that to get the I/O information with the STATISTICS IO option, I ran the query without turning on the discard results options. Then to get the runtimes, I reran the query with results discarded.)

   Prev. page   [1] 2     next page
 
 

ADS BY GOOGLE