Understanding the TOP clause's capabilities and limitations
Editor's Note:Send your experts-only T-SQL tips to Itzik Ben-Gan at blackbelt@sqlmag.com. If we use your tip in the magazine, you'll recieve $100 and an exclusive T-SQL Black Belt shirt.
T-SQL lets you limit the number of rows in a query's result by specifying a search condition in the WHERE clause. The search condition is a logical expression, and it returns only rows for which the logical expression evaluates to TRUE. In some cases, however, you might want to limit the rows in the result to a specific number of rows without specifying a logical condition. To limit the number of rows, you can use the TOP clause in your SQL Server 2000 and 7.0 queries. But TOP has limitations. This article shows how to use the TOP clause and discusses ways to overcome the TOP clause's restrictions, including a method that Dr. Tom Moreau, president of Toronto-based Brockman Moreau Consulting, submitted.
TOP Queries, ANSI, and the Relational Model
Using the TOP n option, you can limit the result of a SELECT query to a fixed number of rows. The TOP n PERCENT option limits the result to a more scalable number of rows or bases the result on the percentage of rows supplied, rounded up to the nearest integer. The TOP WITH TIES option specifies that, in addition to returning the rows that the query would get without the option, you also want the rows with the same values as those in the last row of columns that the ORDER BY clause specifies. For example, to get the five orders that have the lowest OrderID from the Orders table in the Northwind database, you can issue the following query:
SELECT TOP 5 OrderID, CustomerID, EmployeeID, OrderDate
FROM Orders AS O1
ORDER BY OrderID
TOP has little meaning unless you specify the ORDER BY clause. Without ORDER BY, a request such as "Give me the first five orders" has little meaning. You can't guarantee which rows the query will return; the results depend on the plan that the optimizer chooses to process the query and the data's physical layout on disk. You'd only want to use TOP without an ORDER BY clause when you want a partial number of sample rows and you don't care which rows you get.
TOP is neither in accord with the relational model nor ANSI-compliant. The relational model deals with sets, and ANSI SQL lets you manipulate those sets and limit the number of rows in a query's result by using logical conditions that define which rows to return. ANSI SQL contains no options that assume a particular physical data layout that can determine which rows to return.
According to the relational model, a table is a set of a particular entity's occurrences. Rows in a table are in no specific order. Thus, a query with an ORDER BY clause doesn't return a table. If you use the ORDER BY clause in a view or derived table, the query will fail. But SQL Server lets you use the ORDER BY clause in a TOP query for derived tables and views. In an upcoming article on sorting techniques, I'll show you how to overcome this apparent contradiction.
Although TOP queries aren't ANSI-compliant, they provide a solution to many problems that would be too performance-intensive if you wrote them in an ANSI-compliant manner. However, TOP queries can take you only so far. They can't solve all problems associated with limiting the number of rows in a result.
Getting Rows m to n
A TOP query can't get rows m to n in the order that the ORDER BY clause specifies. For example, suppose you want to return orders 6 to 10 from the Orders table; as you page through the orders in groups of five rows, you want to retrieve the second group of rows. You can't respond with a TOP query. However, this request has an ANSI-compliant solution (I use SELECT * for brevity; you should select only the required columns):
SELECT *
FROM Orders AS O1
WHERE (SELECT COUNT(*) FROM Orders AS O2
WHERE O2.OrderID <= O1.OrderID) BETWEEN 6 AND 10
ORDER BY OrderID
This query's poor performance is noticeable, especially with large tables. For each key (OrderID), the query counts the number of keys that are less than or equal to that key. If it's unique, the first key will have 1, the second 2, and so on. This query provides the result's ordinal position, or row number, and lets you limit the result to rows 6 to 10. You can improve performance by orders of magnitude with a TOP query; in my tests, the following query incurred about 1/15 of the previous query's performance cost:
SELECT *
FROM (SELECT TOP 5 *
FROM (SELECT TOP 10 * FROM Orders ORDER BY OrderID) AS O1
ORDER BY OrderID DESC) AS O2
ORDER BY OrderID
The problem with this query is that it sorts the input three times. This approach can be very time-consuming if you page through the rows in groups, requesting a different group of rows each time.
Prev. page  
[1]
2
3
4
next page