DOWNLOAD THE CODE:
Download the Code 44138.zip

When you write applications that return query results to the client and display the results on the screen, you need to take into account the end user's screen-size limitations. If a query's result set contains too many rows to fit on the screen, you have to introduce logic into the application to split the original result set into chunks or pages. You then provide buttons or other graphical elements that let the user navigate between the result pages. The process of splitting the data into chunks, called paging, is very common, especially with Web applications. Let's look at some efficient ways to achieve paging, using two navigation options and paging techniques, then consider a commonly used technique that isn't guaranteed to work as you might expect.

Orders Scenario
When you're planning to implement paging in your application, you need to determine three things. First, what sort order do you need for the rows you'll return? Second, how many rows do you want to return per page? And third, what navigation technique will you provide for users to page through the results?

The sort order and the page size (i.e., the number of rows in a page) you choose determine which rows appear on which page. My examples use the Northwind database's Orders table. For this article's examples, I use a sort order based on OrderDate and OrderID and a page size of 10 rows. Note that sorting on unique columns is important. Without a unique sort, the relationship of the source row to the target page isn't deterministic, meaning that you can't determine for sure which page a row belongs to. Let's say you want to return the orders sorted chronologically. Because the values in the OrderDate column don't have to be unique, you need to choose a "tiebreaker" column to sort on secondarily that will determine the order of rows that have the same order date. I chose the OrderID column as the tiebreaker because it's unique.

How you choose to set up navigation between pages depends on the application users' needs. If the users need to navigate only from the current page to either the next or the previous page, you can implement a simple, efficient solution. But if users need more dynamic capabilities, such as navigating to a specific page number, you need to implement a solution that lets you filter the rows of a desired page based only on two inputs: page size and page number. Let's start by looking at navigation between adjacent pages.

Navigating to Adjacent Pages
To allow navigation to adjacent pages, you need to implement code that performs three tasks: get the first page, get the next page, and get the previous page. I recommend that you use stored procedures, one for each task. Using stored procedures gives you the flexibility to change your implementation if you find a more efficient paging method later without needing to change the client application.

Run Listing 1's code in the Northwind database to create the stored procedure that returns the first page of the Orders table. The stored procedure's code is simple. The query retrieves the top 10 rows, based on an ascending sort by order date and order ID. Run the following code to test the stored procedure:

EXEC dbo.usp_GetFirstPage;

As Figure 1 shows, this code returns the first page of Orders.

The stored procedure that returns the next page after the current one accepts two input arguments: the order date and order ID of the last row on the previously returned page. In the client application, you need to store those values locally while processing the call to the current page.

Run the code that Listing 2, page 38, shows to create the usp_GetNextPage stored procedure, which returns the page after the current one. The query filters to find the rows that have either a later order date than the order date of the last row on the previous page or the same order date and a higher order ID. The query returns the top 10 rows based on an ascending sort by order date and order ID.

Assuming the user has already requested the first page, the client application locally stored the values of order date and order ID from the last row on the first page returned. When the user requests the second page, the client application submits the following code to invoke usp_GetNextPage:

EXEC dbo.usp_GetNextPage @last_orderdate = '19960716', @last_orderid   = 10257;

The procedure returns the second page of Orders, as the result in Figure 2 shows. The client application now locally stores the order date and order ID of the last row on this page in anticipation of sending those values as input arguments if the user requests the next page. Similarly, the client application also locally stores the order date and order ID of the first row returned on this page, which it will use if the user requests the previous page.

The implementation of the stored procedure that returns the previous page is similar to the implementation of usp_GetNextPage, with a few changes. First, you need to reverse the direction of the operators—instead of greater than (>), use less than (<). Also, you need to use an opposite (descending) sort direction for the order date and order ID. Applying these two changes will give you the correct page but with the rows sorted in descending order. To get the page rows back into ascending order, you can use a derived table, query all its rows, and sort them in ascending order.

Run the code that Listing 3 shows to create the usp_GetPrevPage stored procedure. To test the stored procedure, execute it as follows, providing the order date and the order ID of the first row returned on the second page:

EXEC dbo.usp_GetPrevPage @first_orderdate = '19960717', @first_orderid = 10258;

You get the first page (Figure 1) back again.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

The author correctly pointed out that a query described in this article doesn't garantees that rownum values will matches exactly the order clause. However a light variation on this query make it works just fine. By using a derived table to get the sorted data, the optimizer is forced to complete the sort before the insert. This insures rownum values matching the order clause.

SELECT IDENTITY(int, 1, 1) AS RowNum, OrderID+0 AS OrderID INTO dbo.#OrdersRN FROM (Select top 100 percent * from dbo.Orders ORDER BY OrderDate, OrderID) as x;

pelsql

Article Rating 4 out of 5

For non-adjacent pages... Why not create the temporary table first (either CREATE TABLE or SELECT INTO WHERE 0=1) and use INSERT SELECT to get the correct sequencing?

Anonymous User

Bad

Anonymous User

Article Rating 1 out of 5

Implement paging without search clauses is not very useful, don't you think?. This article would be better if you can mix this topics

Anonymous User

Article Rating 2 out of 5

There was a good article on server side paging in December 2003 by Andrew Rosca. I hear that Oracle has a pseudo col called rownum(or row id) which can be used to page. Is there an equivalent that needs to be exposed to users on sqlserver?

Anonymous User

Article Rating 1 out of 5

For non-adjacent pages, instead of using cursor to create temp table, the following query will do it. SELECT Orders.OrderID, COUNT(Orders_1.OrderID) AS RowNum into dbo.#OrdersRN FROM Orders INNER JOIN Orders Orders_1 ON Orders.OrderID > Orders_1.OrderID AND Orders.OrderDate = Orders_1.OrderDate OR Orders.OrderDate > Orders_1.OrderDate GROUP BY Orders.OrderDate, Orders.OrderID ORDER BY Orders.OrderDate, Orders.OrderID

Anonymous User

Article Rating 3 out of 5

what year is this?

Anonymous User

I appreciate all your feedback and taking the time to write those. I'll certainly take those into consideration in my future writings. I'd like to address the issues you raised:

* To pelsql, a derived table is a table, and a table has no predetermined order to its rows. This also applies to the TOP 100 PERCENT trick. All you can really trust, is that the TOP option would work based on the sort you specified, and if ORDER BY is specified in the outermost query, it guarantees that rows would be returned sorted to the client. But when used to create a derived table, it's a different story. Unless you find documentation by Microsoft that guarantees otherwise (something that conflicts with standard SQL and the relational model), trusting such behavior is an invitation for trouble. Of course you might decide to take the risk, based on your observations that something appears to behave in a certain way in the cases you tested. Just as an example, let's assume for a moment that in the current latest service pack of SQL Server 2000 using the TOP 100 PERCENT trick always performs a sort before assigning identity values. What if in a future service pack the optimizer will introduce a new optimization technique and simply ignore a TOP 100 PERCENT when specified in a table expression (Derived Table, View, Inline Function)? Remember that programmers that trusted a certain physical order of rows in the past were proved wrong. E.g., in SQL Server 6.5 the only algorithm used to group data was sorting. This implied that groups were guaranteed to be sorted in the resultset. As of 7.0, a hashing algorithm was also added to group data, so sorting of the groups in the result is not guaranteed anymore. Programmers that wanted the groups sorted and relied on physical processing rather than the standard, and had not specified ORDER BY, had to revise their code when upgrading to 7.0/2000. Also, there are so many different optimization techniques and access methods that the optimizer can utilize including intra-query-parallelism, enhanced read-ahead and scans, and so on. I'm always reluctant to trust a certain physical processing order that is not documented, especially if it conflicts with the relational model.

* Regarding filters; it's a good point. Paging usually also involves dynamic filters (and also dynamic sorting). For what it's worth, I did want to focus on the paging techniques themselves, and from my teaching experience, an idea is delivered better if you focus on a point. The implementation of dynamic filters and sorting is usually achieved with dynamic SQL. In some cases filtering can be achieved using static queries with COALESCE. Anyway, point taken.

* Regarding the various suggested techniques to calculate row numbers, let me try and summarize my thoughts on the issue: - In SQL Server 2000/SP3, using IDENTITY (the function in a SELECT INTO statement, or the property in an INSERT INTO statement), you are not guaranteed that identity values would represent the sort specified in the ORDER BY clause. No matter how you try to force it, it's simply not guaranteed. - Calculating row numbers using a set based technique (using a subquery, or the suggested join), is very slow. Please refer to the article I wrote about row numbers (doc ID 42302). - In SQL Server 2005, it will actually be guaranteed that IDENTITY values would reflect the sort specified in the ORDER BY clause. The irony is that paging code using IDENTITY currently written for SQL Server 2000 is not guaranteed to work correctly, but it will work correctly in SQL Server 2005. So, you can always claim that your current code is SQL Server 2005 compatible. ;-) Furthermore, for paging purposes you won't need to rely on IDENTITY anymore since SQL Server 2005 has a ROW_NUMBER() function which generates values based on a requested sort, using a single scan of the data.

Regards, Itzik Ben-Gan

BG_SQL

Do we know for sure that there's no difference between SELECT/INTO (with the IDENTITY function) and INSERT/SELECT (with the IDENTITY column property)? I could see SQL Server possibly assigning values with the IDENTITY function prior to sorting, but how could that be true with the IDENTITY column property? It would seem that delaying the assignment of values until the time of row insertions would be safe, unless the ORDER BY is completely ignored.

Anonymous User

We don’t know that there is no difference between SELECT INTO and INSERT SELECT in terms of optimization and IDENTITY assignment. Following your logic, I’d actually expect IDENTITY values to be assigned after the sort, because according to ANSI SQL, the SELECT clause is evaluated before the ORDER BY clause. :-) But that’s not the point; neither SELECT INTO nor INSERT SELECT absolutely guarantees a sort before assigning the identity values. The way I see it, the fact that most (emphasis on most) execution plans do have a sort before identity assignment (for cost reasons) is even worse than the same probability for both cases. When most tests seem to work like you expect, it leads people to trust this functionality, and see how many people have this misconception! In production code, large tables, parallel machines, or any random scenario, it might not work as you expect. We’re talking about a non-ANSI element. And I wish Microsoft had chosen one of two options: disallowing an ORDER BY in an INSERT INTO / SELECT INTO, or ALWAYS guaranteeing a sort before identity assignment, and in SQL Server 2005 they chose the latter. But I’m afraid that in SQL Server 2000, it’s simply not guaranteed.

Regards, Itzik Ben-Gan

BG_SQL

I guess I'm still not sure how one could confidently state that the proper ordering for INSERT/SELECT (with the IDENTITY property on a column) is simply not guaranteed. Do you have a example that demonstrates failure? Have you spoken with Microsoft's SQL Server engineers? Could you describe a sequence of events that would make it fail? As I understand it, the assignment of IDENTITY values in this case happens as the rows are inserted (unless a table lock is used another process could allocate an IDENTITY value in the middle of the expected range). Once rows are inserted it's too late for sorting, so it seems that the ORDER BY must be either applied prior to any insertion or effectively ignored. What am I failing to understand?

Anonymous User

I see now the Compute Scalar operation on the execution plan. I assume you are saying that nothing prevents the Sort operation from moving left of the Compute Scalar operation. I realize that lack of an example of the negative is not proof of the positive, but short of an example causing the Sort operation to move I think only Microsoft could tell us whether it could move. How did you conclude that it could move? What would you consider to be a guarantee that it would never move?

Anonymous User

I have been looking at various paging techniques, and so far have found three that seemed reasonable. The first two are very similar to the 'Navigating to Non-Adjacent Pages' technique described above.

The third technique, which surprisingly has proven to be the quickest under a number of scenarios I have listed below. Simple, but I would never have thought it would be the fastest.

CREATE PROCEDURE PageOrdersTable @PageSize int, @PageNum int AS

DECLARE @pageSizeString varchar(10) DECLARE @rowCountString varchar(10)

SET @pageSizeString = CAST(@PageSize AS varchar(10)) SET @rowCountString = CAST(@PageSize * @PageNum AS varchar(10))

EXEC( 'SELECT OrderID, OrderDate, CustomerID, EmployeeID, ShipVia FROM ( SELECT TOP ' + @pageSizeString + ' * FROM ( SELECT TOP ' + @rowCountString + '* FROM orders ORDER BY orderdate ASC) AS filteredResults ORDER BY orderdate DESC) as pageResults order by orderdate ASC')

asdavey

Article Rating 4 out of 5

Itzik,

When you say that a sort before identity assignment is simply not guaranteed do you mean that it's not explicitly stated in BOL or do you mean that you have inside knowledge about optimizer behavior?

Thanks!

Anonymous User

I’m saying this because I discussed the issue with the developers from the engine/optimizer team. I was explicitly told that they do not guarantee that the ORDER BY would be applied before assigning the IDENTITY values in SQL Server 2000. The issue came up as a result of a case where IDENTITY values did not reflect the ORDER BY clause, and you could see in the plan the compute scalar operator generating the values before the sort. I’m afraid I cannot provide details as I’m under NDA, and the whole discussion was in a private forum. This is not considered a bug, since Microsoft never guaranteed anything about the relationship between ORDER BY and IDENTITY. And this also relates to the risk in relying on a certain “expected” behavior that is not documented in BOL or elsewhere. However, since they are very well aware that so many developers assume IDENTITY values would reflect the sort, and have lots of code relying on that, I was told that they’re going to guarantee this in SQL Server 2005. But again, in SQL Server 2005 you won’t need such a guarantee, since you’ll be able to use the ROW_NUMBER() function.

BG_SQL

See More Comments  1   2