October 29, 2009 03:29 PM

Virtual Auxiliary Table of Numbers

Rating: (6)
SQL Server Magazine
InstantDoc ID #103056

An auxiliary table of numbers is a helper table that contains a sequence of integers from 1 and on. It is a very handy helper table that I use to solve many different types of problems. In case you cannot, or do not want to generate a permanent table, you can produce a virtual one on the fly very efficiently. The trick is to use cross joins. You start with a virtual table with two rows:

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1)

Then, you perform a cross join between two instances of this table:

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)

Then, you perform a cross join between two instances of the last table:

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)

And after applying such cross joins five time, you have a table with 2^2^5 (4,294,967,296) rows:

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B)

To produce the actual sequence of numbers, you can use the ROW_NUMBER function with ORDER BY (SELECT NULL), letting the optimizer know that it doesn’t really need to sort the data.

You can create a table function based on this code, and request a certain number of numbers by passing an input parameter. The tricky part is to come up with a solution that stops processing the Cartesian products as soon as the requested number of numbers was produced, and not always try to generate all four billion of those. Until recently, I used the following solution:

IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;

GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

AS

RETURN

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)

  SELECT n FROM Nums WHERE n <= @n;

GO

 

To return 10 numbers you would simply pass the value 10 as input:

SELECT n FROM dbo.GetNums(10) AS Nums;

 

This generates the following output:

n

---

1

2

3

4

5

6

7

8

9

10

 

This code finishes fast and does stop processing after 10 rows are produced. If you look at the execution plan produced for this query, you will notice a Top operator that is in charge of stopping the processing at the right point. However, I noticed that in certain cases the optimizer removes the Top operator, and actually tries  to generate all four billion rows, before the filtering part. Here’s one such example:

WITH Primes(p) AS

(

  SELECT 2

  UNION ALL SELECT 3

  UNION ALL SELECT 5

  UNION ALL SELECT 7

)

SELECT *

FROM Primes

  CROSS APPLY dbo.GetNums(Primes.p) AS Nums;

 

Recently I revised the solution to use the TOP option instead of the WHERE filter like so:

IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;

GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

AS

RETURN

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)

  SELECT TOP (@n) n FROM Nums ORDER BY n;

GO

 

And for this solution the optimizer always seems to use the Top operator and stop processing in the right place. Try the following after recreating the function using the new version:

WITH Primes(p) AS

(

  SELECT 2

  UNION ALL SELECT 3

  UNION ALL SELECT 5

  UNION ALL SELECT 7

)

SELECT *

FROM Primes

  CROSS APPLY dbo.GetNums(Primes.p) AS Nums;

 

And this time it should finish quickly.

Cheers,

BG

Add a Comment

WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1 /*repeat x16 or even more, use row constructor in 2008!*/)
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L3)
SELECT TOP (@n) n FROM Nums ORDER BY n;

is a lot faster. I'm not sure where the tradoff between constructor and cross join lies, but it is certainly not at 2 records!

DM11/2/2009 8:14:23 AM


Hi BG,

Could u give us a example which type of problem need to use Virtual Auxiliary Table of Numbers ?

looking forward your Article

[Reply:

There are many: splitting separated lists of values, generating a sequence of date and time values for a time dimension in a data warehouse, unpivoting data, producing copies of rows, generating histograms, and others.
Check out InstantDoc ID #100657 to see the splitting separated lists of values example (http://www.sqlmag.com/Articles/ArticleID/100657/100657.html?Ad=1)

Cheers,
Itzik]

qiang10/30/2009 3:01:55 PM


I've tried the recursive approach you suggest, oddsock and as per article #94376, having the OPTION within the function doesn't work (syntax error) - it'd be a huge pain to have to remember to specify the OPTION on all queries that use GetNums.

William10/30/2009 9:09:30 AM


Nice idea, but it does seem overly complex. This will do the job just as well:

IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
WITH CTE
AS
(SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM CTE
WHERE n < @n)
SELECT n
FROM CTE
OPTION(MAXRECURSION 0)
G0

[Reply: Hi oddsock,

Besides the fact that you cannot specify the MAXRECURSION hint as part of the inner query (as William mentioned), the recursive solution is very slow compared to the nonrecursive one I showed. It took my solution 3 seconds to generate 10,000,000 numbers (with Discard results after execution turned on), while it took your recursive query 172 seconds to do the same.
As for the complexity of the solution, it’s hidden from the user by encapsulating it in a function. As long as you trust that it works correctly, the use is simple.

Cheers,
Itzik]

CHRIS10/30/2009 3:15:11 AM


You must log on before posting a comment.

Are you a new visitor? Register Here

Rename Virtual Server (Cluster)

I am going to rename a clustered instance (default).1. Change SQL Network Name2. Take offline3. Bring online4. Flush DNS, Cache5. Test failover6. Rebu...222-96209

GOOGLE LINKS
SPONSORED LINKS
FEATURED LINKS