Table 2 shows the output of Listing 4's query. STATISTICS IO shows 10 logical reads for the query in Listing 4. You can try implementing other approaches to solve the same problem and compare the performance of each. For example, instead of writing subqueries that calculate aggregations, you can write simpler subqueries that use the T-SQLspecific TOP clause, as the query in Listing 5 shows. The subquery in the SELECT list retrieves the first row that has the same customer ID as in the outer query and a contract date earlier than the one in the outer query, then sorts the rows by contract date in descending order. The sorting guarantees that the query will return the amount of the contract with the maximum (latest) date. The filter uses a similar technique to ensure that the query returns only the latest contract for each customer.
STATISTICS IO shows 18 logical reads for the query in Listing 5. So in terms of I/O, the query in Listing 4 performs better than the one in Listing 5 and also has the advantage of being ANSI-compliant. In some cases, a TOP query performs better than the alternatives, so testing different versions of your code can really pay off. A graphical execution plan for both queries in Query Analyzer shows that the total cost for Listing 4's query is slightly less than for the query in Listing 5. However, the difference isn't significant; in a close case like that, I usually stick with the ANSI-compliant code even when it's not the most efficient.
Alternatively, you could implement a totally different approach, such as the query that Listing 6 shows. This query uses joins and derived tables instead of subqueries. In some cases, a join query performs better than its subquery alternative. Listing 6 contains a GROUP BY query that returns the maximum contract date (mx_date) and the contract date previous to it (prev_mx_date) for each customer. This GROUP BY query returns a derived table that's aliased as G. Note that the LEFT OUTER JOIN statement between the derived table G and the Contracts table ensures that if a customer has only one contract, the query won't filter out that contract. The outer query joins the derived table to the Contracts table based on the same customer ID and a contract date that's equal to mx_date. The query then joins the result with another instance of the Contracts table, based on the same customer ID and a contract date equal to prev_mx_date. Then, the code simply displays the appropriate columns.
STATISTICS IO for the query in Listing 6 shows 42 logical reads, which is considerably more than the previous two queries. In this case, the approach that uses joins and derived tables has the worst performance, and the query in Listing 4, which uses subqueries, performs best.
What's Next?
I can't stress enough the importance of switching from a "sequential files" mindset to "set-based" thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code. In upcoming articles, I'll discuss some more-complex examples that seem to lend themselves to cursor-based solutions but are better off with a set-based solution.