• subscribe
October 24, 2001 12:00 AM

Sequential to Set-Based

SQL Server Pro
InstantDoc ID #22431
Downloads
22431.zip

Table 2 shows the output of Listing 4's query. STATISTICS IO shows 10 logical reads for the query in Listing 4. You can try implementing other approaches to solve the same problem and compare the performance of each. For example, instead of writing subqueries that calculate aggregations, you can write simpler subqueries that use the T-SQL—specific TOP clause, as the query in Listing 5 shows. The subquery in the SELECT list retrieves the first row that has the same customer ID as in the outer query and a contract date earlier than the one in the outer query, then sorts the rows by contract date in descending order. The sorting guarantees that the query will return the amount of the contract with the maximum (latest) date. The filter uses a similar technique to ensure that the query returns only the latest contract for each customer.

STATISTICS IO shows 18 logical reads for the query in Listing 5. So in terms of I/O, the query in Listing 4 performs better than the one in Listing 5 and also has the advantage of being ANSI-compliant. In some cases, a TOP query performs better than the alternatives, so testing different versions of your code can really pay off. A graphical execution plan for both queries in Query Analyzer shows that the total cost for Listing 4's query is slightly less than for the query in Listing 5. However, the difference isn't significant; in a close case like that, I usually stick with the ANSI-compliant code even when it's not the most efficient.

Alternatively, you could implement a totally different approach, such as the query that Listing 6 shows. This query uses joins and derived tables instead of subqueries. In some cases, a join query performs better than its subquery alternative. Listing 6 contains a GROUP BY query that returns the maximum contract date (mx_date) and the contract date previous to it (prev_mx_date) for each customer. This GROUP BY query returns a derived table that's aliased as G. Note that the LEFT OUTER JOIN statement between the derived table G and the Contracts table ensures that if a customer has only one contract, the query won't filter out that contract. The outer query joins the derived table to the Contracts table based on the same customer ID and a contract date that's equal to mx_date. The query then joins the result with another instance of the Contracts table, based on the same customer ID and a contract date equal to prev_mx_date. Then, the code simply displays the appropriate columns.

STATISTICS IO for the query in Listing 6 shows 42 logical reads, which is considerably more than the previous two queries. In this case, the approach that uses joins and derived tables has the worst performance, and the query in Listing 4, which uses subqueries, performs best.

What's Next?
I can't stress enough the importance of switching from a "sequential files" mindset to "set-based" thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code. In upcoming articles, I'll discuss some more-complex examples that seem to lend themselves to cursor-based solutions but are better off with a set-based solution.



ARTICLE TOOLS

Comments
  • FERNANDO
    11 years ago
    Oct 26, 2001

    I recently changed a T-SQL routine that used cursors to one that used five SQL statements, using the set-based concept. I reduced my runtime from 27 hours to 30 minutes. One of my cursors had 5 million rows, which were being compared to
    4 million table records. Though the table was indexed, I looped through my cursor one record at a time and performed my comparisons (I'm from the old school). But now, cursors and I have split the sheets.
    - BW

You must log on before posting a comment.

Are you a new visitor? Register Here