DOWNLOAD THE CODE:
Download the Code 44138.zip

Navigating to Non-Adjacent Pages
If you need to provide users with more flexible navigation capabilities that let them request specific pages regardless of the last page they requested (e.g., page 3, page 10, then page 15), you need to implement a different solution—one that can filter the rows of a requested page when given only the page number and page size as inputs. You can implement a client-side solution or a server-side solution. In a client-side solution, you retrieve all table rows sorted and calculate row numbers by incrementing a counter. You cache all table rows, then respond to a page request by calculating the range of row numbers that you need to return. The first row of a page is ((page number-1) x page size) + 1, and the last row is page number x page size. For example, page 3 with a page size of 10 rows contains the rows 21 through 30. The drawbacks of a client-side solution are that it requires a lot of memory resources at the client and it causes a lot of network traffic and wasted resources when the user doesn't need to access more than a few pages.

In a server-side solution, as in a client-side solution, you calculate row numbers (with sequential integers representing the desired sort) and store the row numbers in a temporary table along with the primary keys of the rows. The client application calculates the range of row numbers of the page the user requests, then submits a query that joins the temporary table and the base table to get all the requested attributes (e.g., order date, customer ID). In the temporary table, you can alternatively store all the attributes of interest to the user, not just the row numbers and the primary keys, thereby removing the need for a join. However, storing all attributes in the temporary table requires a lot of tempdb resources, and you need to determine whether you have those to spare.

Let's start finding a solution to a server-side paging need by calculating row numbers for the different orders and storing them along with the order IDs in a temporary table. Many programmers use a technique to calculate row numbers that isn't guaranteed to always work as you expect. The following code (don't run it!) represents a typical attempt to calculate row numbers for each order ID, based on order date and order ID order:

SELECT IDENTITY(int, 1, 1) AS 
	RowNum, OrderID+0 AS OrderID
INTO dbo.#OrdersRN
FROM dbo.Orders
ORDER BY OrderDate, OrderID;

This technique won't necessarily work as you'd expect because Microsoft doesn't guarantee that SQL Server will calculate IDENTITY values after the sort takes place. Without such a guarantee, the IDENTITY values that SQL Server assigns don't necessarily represent the requested sort. So, you have to use a different technique to calculate row numbers. In my April 2004 T-SQL 2005 Web column, "Calculating Row Numbers in SQL Server 2005" (InstantDoc ID 42302), I mention that a set-based technique to calculate row numbers in SQL Server 2000 is very slow. Until you use SQL Server 2005, you have to use a cursor if you want to calculate row numbers with reasonable performance. Listing 4 shows the cursor-based code you can use to iterate through the sorted orders, calculating row numbers as you go, and storing the row numbers along with the order IDs in a temporary table. After the code creates the temporary table, your application responds to a specific page request, as Listing 5 shows. You need to run Listing 5's code to get a specific page of orders (e.g., page 3) by joining the Orders table and the temporary table that the code in Listing 4 created, filtering the range of rows that appear in the requested page.

Paging Correctly
Paging is necessary when you want to return a query's result set back to the client application in chunks. You can now provide solutions for two navigation techniques: adjacent pages and non-adjacent pages. And using stored procedures provides encapsulation that lets you alter your solutions' implementation without affecting the client application. Also, remember that IDENTITY values won't necessarily correspond to the sort order you specify in the query's ORDER BY clause, so avoid that technique for calculating row numbers based on a certain order.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

The author correctly pointed out that a query described in this article doesn't garantees that rownum values will matches exactly the order clause. However a light variation on this query make it works just fine. By using a derived table to get the sorted data, the optimizer is forced to complete the sort before the insert. This insures rownum values matching the order clause.

SELECT IDENTITY(int, 1, 1) AS RowNum, OrderID+0 AS OrderID INTO dbo.#OrdersRN FROM (Select top 100 percent * from dbo.Orders ORDER BY OrderDate, OrderID) as x;

pelsql

Article Rating 4 out of 5

For non-adjacent pages... Why not create the temporary table first (either CREATE TABLE or SELECT INTO WHERE 0=1) and use INSERT SELECT to get the correct sequencing?

Anonymous User

Bad

Anonymous User

Article Rating 1 out of 5

Implement paging without search clauses is not very useful, don't you think?. This article would be better if you can mix this topics

Anonymous User

Article Rating 2 out of 5

There was a good article on server side paging in December 2003 by Andrew Rosca. I hear that Oracle has a pseudo col called rownum(or row id) which can be used to page. Is there an equivalent that needs to be exposed to users on sqlserver?

Anonymous User

Article Rating 1 out of 5

For non-adjacent pages, instead of using cursor to create temp table, the following query will do it. SELECT Orders.OrderID, COUNT(Orders_1.OrderID) AS RowNum into dbo.#OrdersRN FROM Orders INNER JOIN Orders Orders_1 ON Orders.OrderID > Orders_1.OrderID AND Orders.OrderDate = Orders_1.OrderDate OR Orders.OrderDate > Orders_1.OrderDate GROUP BY Orders.OrderDate, Orders.OrderID ORDER BY Orders.OrderDate, Orders.OrderID

Anonymous User

Article Rating 3 out of 5

what year is this?

Anonymous User

I appreciate all your feedback and taking the time to write those. I'll certainly take those into consideration in my future writings. I'd like to address the issues you raised:

* To pelsql, a derived table is a table, and a table has no predetermined order to its rows. This also applies to the TOP 100 PERCENT trick. All you can really trust, is that the TOP option would work based on the sort you specified, and if ORDER BY is specified in the outermost query, it guarantees that rows would be returned sorted to the client. But when used to create a derived table, it's a different story. Unless you find documentation by Microsoft that guarantees otherwise (something that conflicts with standard SQL and the relational model), trusting such behavior is an invitation for trouble. Of course you might decide to take the risk, based on your observations that something appears to behave in a certain way in the cases you tested. Just as an example, let's assume for a moment that in the current latest service pack of SQL Server 2000 using the TOP 100 PERCENT trick always performs a sort before assigning identity values. What if in a future service pack the optimizer will introduce a new optimization technique and simply ignore a TOP 100 PERCENT when specified in a table expression (Derived Table, View, Inline Function)? Remember that programmers that trusted a certain physical order of rows in the past were proved wrong. E.g., in SQL Server 6.5 the only algorithm used to group data was sorting. This implied that groups were guaranteed to be sorted in the resultset. As of 7.0, a hashing algorithm was also added to group data, so sorting of the groups in the result is not guaranteed anymore. Programmers that wanted the groups sorted and relied on physical processing rather than the standard, and had not specified ORDER BY, had to revise their code when upgrading to 7.0/2000. Also, there are so many different optimization techniques and access methods that the optimizer can utilize including intra-query-parallelism, enhanced read-ahead and scans, and so on. I'm always reluctant to trust a certain physical processing order that is not documented, especially if it conflicts with the relational model.

* Regarding filters; it's a good point. Paging usually also involves dynamic filters (and also dynamic sorting). For what it's worth, I did want to focus on the paging techniques themselves, and from my teaching experience, an idea is delivered better if you focus on a point. The implementation of dynamic filters and sorting is usually achieved with dynamic SQL. In some cases filtering can be achieved using static queries with COALESCE. Anyway, point taken.

* Regarding the various suggested techniques to calculate row numbers, let me try and summarize my thoughts on the issue: - In SQL Server 2000/SP3, using IDENTITY (the function in a SELECT INTO statement, or the property in an INSERT INTO statement), you are not guaranteed that identity values would represent the sort specified in the ORDER BY clause. No matter how you try to force it, it's simply not guaranteed. - Calculating row numbers using a set based technique (using a subquery, or the suggested join), is very slow. Please refer to the article I wrote about row numbers (doc ID 42302). - In SQL Server 2005, it will actually be guaranteed that IDENTITY values would reflect the sort specified in the ORDER BY clause. The irony is that paging code using IDENTITY currently written for SQL Server 2000 is not guaranteed to work correctly, but it will work correctly in SQL Server 2005. So, you can always claim that your current code is SQL Server 2005 compatible. ;-) Furthermore, for paging purposes you won't need to rely on IDENTITY anymore since SQL Server 2005 has a ROW_NUMBER() function which generates values based on a requested sort, using a single scan of the data.

Regards, Itzik Ben-Gan

BG_SQL

Do we know for sure that there's no difference between SELECT/INTO (with the IDENTITY function) and INSERT/SELECT (with the IDENTITY column property)? I could see SQL Server possibly assigning values with the IDENTITY function prior to sorting, but how could that be true with the IDENTITY column property? It would seem that delaying the assignment of values until the time of row insertions would be safe, unless the ORDER BY is completely ignored.

Anonymous User

We don’t know that there is no difference between SELECT INTO and INSERT SELECT in terms of optimization and IDENTITY assignment. Following your logic, I’d actually expect IDENTITY values to be assigned after the sort, because according to ANSI SQL, the SELECT clause is evaluated before the ORDER BY clause. :-) But that’s not the point; neither SELECT INTO nor INSERT SELECT absolutely guarantees a sort before assigning the identity values. The way I see it, the fact that most (emphasis on most) execution plans do have a sort before identity assignment (for cost reasons) is even worse than the same probability for both cases. When most tests seem to work like you expect, it leads people to trust this functionality, and see how many people have this misconception! In production code, large tables, parallel machines, or any random scenario, it might not work as you expect. We’re talking about a non-ANSI element. And I wish Microsoft had chosen one of two options: disallowing an ORDER BY in an INSERT INTO / SELECT INTO, or ALWAYS guaranteeing a sort before identity assignment, and in SQL Server 2005 they chose the latter. But I’m afraid that in SQL Server 2000, it’s simply not guaranteed.

Regards, Itzik Ben-Gan

BG_SQL

I guess I'm still not sure how one could confidently state that the proper ordering for INSERT/SELECT (with the IDENTITY property on a column) is simply not guaranteed. Do you have a example that demonstrates failure? Have you spoken with Microsoft's SQL Server engineers? Could you describe a sequence of events that would make it fail? As I understand it, the assignment of IDENTITY values in this case happens as the rows are inserted (unless a table lock is used another process could allocate an IDENTITY value in the middle of the expected range). Once rows are inserted it's too late for sorting, so it seems that the ORDER BY must be either applied prior to any insertion or effectively ignored. What am I failing to understand?

Anonymous User

I see now the Compute Scalar operation on the execution plan. I assume you are saying that nothing prevents the Sort operation from moving left of the Compute Scalar operation. I realize that lack of an example of the negative is not proof of the positive, but short of an example causing the Sort operation to move I think only Microsoft could tell us whether it could move. How did you conclude that it could move? What would you consider to be a guarantee that it would never move?

Anonymous User

I have been looking at various paging techniques, and so far have found three that seemed reasonable. The first two are very similar to the 'Navigating to Non-Adjacent Pages' technique described above.

The third technique, which surprisingly has proven to be the quickest under a number of scenarios I have listed below. Simple, but I would never have thought it would be the fastest.

CREATE PROCEDURE PageOrdersTable @PageSize int, @PageNum int AS

DECLARE @pageSizeString varchar(10) DECLARE @rowCountString varchar(10)

SET @pageSizeString = CAST(@PageSize AS varchar(10)) SET @rowCountString = CAST(@PageSize * @PageNum AS varchar(10))

EXEC( 'SELECT OrderID, OrderDate, CustomerID, EmployeeID, ShipVia FROM ( SELECT TOP ' + @pageSizeString + ' * FROM ( SELECT TOP ' + @rowCountString + '* FROM orders ORDER BY orderdate ASC) AS filteredResults ORDER BY orderdate DESC) as pageResults order by orderdate ASC')

asdavey

Article Rating 4 out of 5

Itzik,

When you say that a sort before identity assignment is simply not guaranteed do you mean that it's not explicitly stated in BOL or do you mean that you have inside knowledge about optimizer behavior?

Thanks!

Anonymous User

I’m saying this because I discussed the issue with the developers from the engine/optimizer team. I was explicitly told that they do not guarantee that the ORDER BY would be applied before assigning the IDENTITY values in SQL Server 2000. The issue came up as a result of a case where IDENTITY values did not reflect the ORDER BY clause, and you could see in the plan the compute scalar operator generating the values before the sort. I’m afraid I cannot provide details as I’m under NDA, and the whole discussion was in a private forum. This is not considered a bug, since Microsoft never guaranteed anything about the relationship between ORDER BY and IDENTITY. And this also relates to the risk in relying on a certain “expected” behavior that is not documented in BOL or elsewhere. However, since they are very well aware that so many developers assume IDENTITY values would reflect the sort, and have lots of code relying on that, I was told that they’re going to guarantee this in SQL Server 2005. But again, in SQL Server 2005 you won’t need such a guarantee, since you’ll be able to use the ROW_NUMBER() function.

BG_SQL

See More Comments  1   2 
 
 

ADS BY GOOGLE