• subscribe
September 21, 2006 12:00 AM

Matching Current and Preceding Rows, Revisited

Fastest solutions use new APPLY operator and row numbers
SQL Server Pro
InstantDoc ID #93099
Downloads
93099.zip

ROW_NUMBER Solutions
SQL Server 2005's new ROW_NUMBER analytical ranking function is handy for solutions that rely on a certain order of the data. ROW_NUMBER lets you assign sequential row numbers to a row set, then restart the numbering for a coordinating row set by using the BY PARTITION clause. In our case, the current and preceding row can be expressed as an offset of one between row numbers. Listing 3's code simply calculates row numbers for each customer's orders separately (PARTITION BY CustomerID), based on OrderDate, then SalesOrderID sorting.

Table 3 shows Listing 3's results in abbreviated form. If you examine the row numbers, you can see how simple it is to match each customer's preceding row to the current row. You just need to join two instances of OrdersCTE (aliased as C and P), based on CustomerID match and an offset of one between the row numbers.

Listing 4 shows the solution that uses the row numbers generated in Listing 3. This solution's query ran for 12 seconds on my system.The upper plan in Figure 1 (shown in abbreviated form) shows that SQL Server scanned the index idx_cid_od_oid_i_td twice and used a hash join to join the two instances of OrdersCTE.The hash join tells you the optimizer chose to create a hash table to satisfy the join instead of using what would seem like a more efficient choice—a merge join based on two ordered scans of the covering index.

I suspect that the optimizer used the hash join because the code in Listing 4 didn't match current and preceding rows based on row-number offset alone—which would have relied on the sorting of the covering index, lending itself to a merge join—but also matched rows based on CustomerID.To get a merge join, I revised the row number to be global across customers.That is, I calculated a row number with no PARTITION BY element, instead using only an ORDER BY element based on CustomerID, Order-Date, and SalesOrderID sorting.This lets the join condition match rows based solely on an offset of one between the row numbers. However, when the current customer ID differs from the preceding one, you need to return a NULL in the calculation columns comparing current and preceding order.You do this by using CASE expressions in the SELECT list.

Listing 5 shows this revised solution's query. The execution plan for this query, presented in abbreviated form at the bottom of Figure 1, shows the desired merge join.This query ran for only 6 seconds on my system and incurred a mere 7466 logical reads.

A Sweet Tune
In the past two articles, I presented six solutions to the problem of matching current and preceding rows.The slowest solution ran in about 2 minutes, and the fastest solution ran in about 6 seconds. I got from the slowest solution to the fastest by using a classic performance-tuning process. Besides using good index design, you can gain dramatic performance improvements by revising your queries to use the best solution for the task at hand.And as you saw in this case, SQL Server 2005's new APPLY and ROW_NUMBER features enable simpler and faster solutions than you can write in SQL Server 2000.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here