• subscribe
March 22, 2005 12:00 AM

T-SQL Tiebreakers

Get the right query results
SQL Server Pro
InstantDoc ID #45235
Downloads
45235.zip

Solutions That Use TOP 1
The TOP 1 solution uses only one correlated subquery regardless of the number of tiebreakers you have. The subquery returns the order ID of the row representing the desired order for the employee ID in the outer row. If the outer row's key is equal to the key the subquery returned, the outer row is returned. Listing 5 shows a solution that uses MAX(OrderID) as the tiebreaker.

The trick here is that the subquery filters all rows belonging to the employee ID specified in the outer row, sorts them by the sort column (OrderDate DESC) and the list of tiebreakers (OrderID DESC in this case), and returns the key of the first row. Of course, in practice, with the recommended index, the subquery won't really access all rows for the employee, nor will it sort anything. It will perform a seek within the index and might even hash or spool the result in order to save additional I/O operations for processing that employee's other orders.

This query performs very well. Even better, it scales well in terms of performance and complexity when you add tiebreakers. For example, to use MAX(CustomerID), MAX(ShipVia), and MAX(OrderID) as tiebreakers, all you need to do is revise the ORDER BY list in Listing 5's subquery to

ORDER BY OrderDate DESC, CustomerID DESC, ShipVia DESC, OrderID DESC

As I mentioned, query performance will vary according to the density of employee IDs, among other things. If the number of employees is small, you can further optimize the TOP 1 technique by querying the Employees table and using a join. Instead of invoking the subquery in the filter of the query against Orders, invoke it in the SELECT list of a query against Employees:

SELECT (SELECT TOP 1 OrderID
  FROM Orders AS K
  WHERE K.EmployeeID =
    E.EmployeeID
    ORDER BY OrderDate DESC, OrderID DESC) AS topkey
FROM Employees AS E

Join this table (call it D) to the Orders table (call it O) on O.OrderID = D.topkey.

The query in Listing 6 shows the revision of Listing 5's query. Here again, enhancing the tiebreaker list is merely a matter of revising the ORDER BY list. Investigate the execution plan and I/O stats for this query, and compare them to the previous ones. You'll find that the technique in Listing 6 performs one seek operation within the covering index for each employee to get the key from the subquery, then a seek operation within the index on order ID to grab the whole row. This plan is more efficient than other plans when the number of employees is small. As an exercise, you can experiment with the number of unique employees in Orders and see when these queries become slower than previous ones.

Solutions That Use Concatenation
You might be satisfied with the solutions you have so far. However, the solutions I've described will perform well only if a good index (preferably a covering index) exists for them. However, if no good index exists and you can't create one, you might want to consider another solution.

The idea is to first concatenate the sort column, tiebreakers, and covered columns. Then, group the data by employee ID, calculate the maximum value of the concatenated string, and in an outer query break the concatenated string into individual attributes. The query in Listing 7 shows a solution that uses MAX(OrderID) as the tiebreaker. To use MAX(CustomerID), MAX(ShipVia), and MAX(OrderID) as tiebreakers, simply revise the order of attributes you specify accordingly.

You can accomplish the concatenation through several methods. The example in Listing 7 concatenates the binary representation of the attributes (I find this concatenation method the simplest), assuming that the sort order of the binary data reflects the original sort order of the attributes. In both examples, I use positive integers, fixed-length character strings, and dates, which keep their original sort behavior. The binary representation for character-based data is case-sensitive, but this doesn't generate any logical problems. Another thing to keep in mind is that if you need to concatenate variable-length data (e.g., varchar, nvarchar), make sure you convert it to a fixed-length binary value, which will pad it with zero bits. Otherwise, SQL Server won't calculate the MAX(concatenated string) value correctly, and you won't be able to figure out its location in the concatenated string.

The main benefit of this solution is its performance—it relies on a single scan of the data, even when no adequate indexes exist. However, it has drawbacks. First, it's complex, not intuitive or trivial. Second, you can use this solution only when you're dealing with data types for sort columns and tiebreakers that keep their original sort behavior when converted to a different type (binary/character) for concatenation purposes.

Finally, all other solutions let you control the sort direction separately for each attribute participating as sort column or tiebreaker. In the ANSI subqueries solutions, you can simply specify MIN or MAX for each attribute. With the TOP 1 solutions, you control the sort direction by specifying ASC or DESC in the ORDER BY clause. But the concatenation solution allows only one sort direction for all attributes—either MAX for all or MIN for all. The only exception is numeric data. For example, if you want to return for each employee the order with the latest order date and use the lowest order ID as the tiebreaker, you make two revisions to the query in Listing 7. First, instead of concatenating OrderID, concatenate MAXINT - OrderID (2147483647 - OrderID). Second, in the outer query, instead of just extracting the OrderID, subtract the value you extract back from MAXINT to return it to its original value (2147483647 - OrderID). MAXINT - MAX(MAXINT - value) effectively gives you MIN(value).

Performance vs. Simplicity
For me, an ideal solution performs well, scales well, is ANSI-compliant, and is simple enough for others to understand and maintain. If you find such a solution to a problem, you're lucky. With most problems, I have to compromise on something. But if you're familiar with many techniques, you have more options. In different situations, I might compromise on different aspects of the solution, depending on the customer's priorities. But until I find the ideal solution, I don't consider myself finished with the problem. I stash it in a drawer and try it again when I learn a new technique.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here