The row_number function is an
extremely powerful tool that I use to simplify and optimize solutions to many
problems. Occasionally I needed to calculate nondeterministic row numbers where
order didn’t matter. For
example, suppose you have the following t1 table:
set nocount on;
use tempdb;
go
if object_id('dbo.t1') is not null
drop table dbo.t1;
go
create table dbo.t1(col1 varchar(10) not null);
insert into dbo.t1(col1) values('c');
insert into dbo.t1(col1) values('a');
insert into dbo.t1(col1) values('b');
insert into dbo.t1(col1) values('a');
insert into dbo.t1(col1) values('b');
insert into dbo.t1(col1) values('b');
insert into dbo.t1(col1) values('c');
insert into dbo.t1(col1) values('a');
insert into dbo.t1(col1) values('b');
insert into dbo.t1(col1) values('a');
go
You need to return all rows from t1 along with unique
incrementing row numbers in no particular order. If
order doesn’t matter, of course you can always specify an existing column from
the table in the row_number
function’s order by clause:
select col1, row_number() over(order by col1) as rownum
from dbo.t1;
However, the execution plan for the query would involve
sorting (or an index order scan if an index exists on
the sort column). Here’s the plan you get for the above query:
|--Sequence Project(DEFINE:([Expr1004]=row_number))
|--Compute
Scalar(DEFINE:([Expr1006]=(1)))
|--Segment
|--Sort(ORDER BY:([tempdb].[dbo].[t1].[col1] ASC))
|--Table
Scan(OBJECT:([tempdb].[dbo].[t1]))
If the order of the row numbers doesn’t matter to you, you’d
probably rather not pay the sort penalty. The
problem is that SQL Server doesn’t allow a constant in the row_number
function’s order by clause. Try running
the following query:
select col1, row_number() over(order by 0) as rownum
from dbo.t1;
And you will get the following error:
Msg 5309, Level 16, State 1, Line 1
Windowed functions do not support constants as
ORDER BY clause expressions.
Until recently, in order to avoid sorting I defined a table
expression (CTE or derived table) based on a query
that returns the table rows along with a constant (call it const). In the outer
query I invoked the row_number
function with order by const:
with c as
(
select col1, 0 as const from dbo.t1
)
select col1, row_number() over(order by const) as rownum
from c;
Here, the optimizer is smart enough to realize that sorting
is not required. Here’s the plan I got for this query
(notice there’s no sort operation):
|--Sequence
Project(DEFINE:([Expr1005]=row_number))
|--Compute
Scalar(DEFINE:([Expr1007]=(1)))
|--Segment
|--Table
Scan(OBJECT:([tempdb].[dbo].[t1]))
Even though this technique avoids sorting, it is a bit awkward.
Recently I got a tip from a T. Wong how to
achieve the same thing without the need for a table expression—simply specify
order by (select 0)!
Here’s the solution query with the new technique:
select col1, row_number() over(order by (select 0)) as rownum
from dbo.t1;
It is much more elegant than the previous technique, and
also here the plan shows that the optimizer realized
that sorting is not needed:
|--Sequence
Project(DEFINE:([Expr1006]=row_number))
|--Compute Scalar(DEFINE:([Expr1008]=(1)))
|--Segment
|--Compute
Scalar(DEFINE:([Expr1005]=(0)))
|--Table
Scan(OBJECT:([tempdb].[dbo].[t1]))
Cheers,
BG