Last month I started a two-part series about query filters. I discussed filters based on predicates, including filters that T-SQL supports, such as ON, WHERE, and HAVING, and filters that T-SQL doesn’t support but that are defined by standard SQL or other database platforms, such as the standard FILTER clause for aggregates and the QUALIFY clause that Teradata supports. This month I focus on a different kind of query filter—one that isn’t based on a predicate, but rather on a requested number of rows and logical ordering.
As a reminder, in my examples I use a sample database called InsideTSQL2008. You can download the source code for this sample database from InsideTSQL.com.
TOP
The TOP filter is a nonstandard filter that was added in SQL Server 7.0 to address a common filtering need—the need to filter rows based on some logical ordering and a requested number (or percent) of rows. You specify the requested number of rows right after the TOP clause, and the ordering is defined by the query’s traditional ORDER BY clause. For example, to return the three orders with the highest order values, you’d use the following query:
USE InsideTSQL2008;
SELECT TOP (3) orderid, orderdate, custid, empid, val
FROM Sales.OrderValues
ORDER BY val DESC;
On the surface, the TOP feature’s design seems straightforward. It’s clear that this query will return the three orders with the highest order values, and it seems obvious that the query will return those rows from highest to lowest order values in terms of presentation ordering. But in practice this design is a source of much confusion and trouble. To see why, we need to go back to T-SQL’s foundations.
T-SQL is based on standard SQL (both ISO and ANSI standards), which in turn is based on the relational model, which in turn is based on two mathematical branches—set theory and predicate logic. Last month’s discussion revolved around predicate-based filters, which gave us a glimpse of the relevance of predicate logic to T-SQL. As for set theory, understanding some of the key concepts from this mathematical branch can give you better insight to the meaning of the code you write using T-SQL.
What’s most relevant to our discussion is the fact that there’s no order to the elements of a set. This concept might sound simple, but it’s a great source of confusion in writing T-SQL code. A table in a SQL Server database represents a relation from the relational model, which in turn represents a set from set theory. You need to keep in mind that in logical terms there’s no order to a table’s rows—never mind what you know about the physical layer that implements the table (e.g., indexes). Similarly, the result of a query without a presentation ORDER BY clause is a table result; therefore, there’s no guarantee for the order of the rows in the output. If you add a presentation ORDER BY clause to a query, the nature of the result is inherently different than without one—you get what standard SQL conceptually considers to be a cursor. The idea behind a cursor is that it represents a result with guaranteed order, meaning that it’s nonrelational.
If you’re wondering why you should care whether the result is conceptually a relation or a cursor, you should note that SQL Server doesn’t typically let you define a table expression, such as a view, inline table-valued function, derived table, or common table expression (CTE), based on a query with an ORDER BY clause. For example, if you try defining the following view:
CREATE VIEW Sales.MySortedOrders
AS
SELECT orderid, orderdate, custid, empid, val
FROM Sales.OrderValues
ORDER BY val DESC;
GO
you get the error in Figure 1.
The error message doesn’t go into a great detail of explanation but simply says the ORDER BY clause isn’t allowed. The reason for disallowing an ORDER BY clause in the view’s query is that a view is supposed to represent a table; a table is supposed to represent a relation; a relation is supposed to represent a set; and a set is unordered. As an aside, Oracle supports the concept of ordered views, which is one of the reasons people try to define them in SQL Server. But you should realize that support for such a construct is in complete contradiction with the relational model.