• subscribe
October 29, 2009 12:00 AM

Virtual Auxiliary Table of Numbers

SQL Server Pro
InstantDoc ID #103056

An auxiliary table of numbers is a helper table that contains a sequence of integers from 1 and on. It is a very handy helper table that I use to solve many different types of problems. In case you cannot, or do not want to generate a permanent table, you can produce a virtual one on the fly very efficiently. The trick is to use cross joins. You start with a virtual table with two rows:

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1)

Then, you perform a cross join between two instances of this table:

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)

Then, you perform a cross join between two instances of the last table:

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)

And after applying such cross joins five time, you have a table with 2^2^5 (4,294,967,296) rows:

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B)

To produce the actual sequence of numbers, you can use the ROW_NUMBER function with ORDER BY (SELECT NULL), letting the optimizer know that it doesn’t really need to sort the data.

You can create a table function based on this code, and request a certain number of numbers by passing an input parameter. The tricky part is to come up with a solution that stops processing the Cartesian products as soon as the requested number of numbers was produced, and not always try to generate all four billion of those. Until recently, I used the following solution:

IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;

GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

AS

RETURN

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)

  SELECT n FROM Nums WHERE n <= @n;

GO

 

To return 10 numbers you would simply pass the value 10 as input:

SELECT n FROM dbo.GetNums(10) AS Nums;

 

This generates the following output:

n

---

1

2

3

4

5

6

7

8

9

10

 

This code finishes fast and does stop processing after 10 rows are produced. If you look at the execution plan produced for this query, you will notice a Top operator that is in charge of stopping the processing at the right point. However, I noticed that in certain cases the optimizer removes the Top operator, and actually tries  to generate all four billion rows, before the filtering part. Here’s one such example:

WITH Primes(p) AS

(

  SELECT 2

  UNION ALL SELECT 3

  UNION ALL SELECT 5

  UNION ALL SELECT 7

)

SELECT *

FROM Primes

  CROSS APPLY dbo.GetNums(Primes.p) AS Nums;

 

Recently I revised the solution to use the TOP option instead of the WHERE filter like so:

IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;

GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

AS

RETURN

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)

  SELECT TOP (@n) n FROM Nums ORDER BY n;

GO

 

And for this solution the optimizer always seems to use the Top operator and stop processing in the right place. Try the following after recreating the function using the new version:

WITH Primes(p) AS

(

  SELECT 2

  UNION ALL SELECT 3

  UNION ALL SELECT 5

  UNION ALL SELECT 7

)

SELECT *

FROM Primes

  CROSS APPLY dbo.GetNums(Primes.p) AS Nums;

 

And this time it should finish quickly.

Cheers,

BG



ARTICLE TOOLS

Comments
  • DM
    3 years ago
    Nov 02, 2009

    WITH
    L0 AS(SELECT 1 AS c UNION ALL SELECT 1 /*repeat x16 or even more, use row constructor in 2008!*/)
    L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L3)
    SELECT TOP (@n) n FROM Nums ORDER BY n;

    is a lot faster. I'm not sure where the tradoff between constructor and cross join lies, but it is certainly not at 2 records!

  • qiang
    3 years ago
    Oct 30, 2009

    Hi BG,

    Could u give us a example which type of problem need to use Virtual Auxiliary Table of Numbers ?

    looking forward your Article

    [Reply:

    There are many: splitting separated lists of values, generating a sequence of date and time values for a time dimension in a data warehouse, unpivoting data, producing copies of rows, generating histograms, and others.
    Check out InstantDoc ID #100657 to see the splitting separated lists of values example (http://www.sqlmag.com/Articles/ArticleID/100657/100657.html?Ad=1)

    Cheers,
    Itzik]

  • William
    3 years ago
    Oct 30, 2009

    I've tried the recursive approach you suggest, oddsock and as per article #94376, having the OPTION within the function doesn't work (syntax error) - it'd be a huge pain to have to remember to specify the OPTION on all queries that use GetNums.

  • CHRIS
    3 years ago
    Oct 30, 2009

    Nice idea, but it does seem overly complex. This will do the job just as well:

    IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;
    GO
    CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
    AS
    RETURN
    WITH CTE
    AS
    (SELECT 1 AS n
    UNION ALL
    SELECT n + 1
    FROM CTE
    WHERE n < @n)
    SELECT n
    FROM CTE
    OPTION(MAXRECURSION 0)
    G0

    [Reply: Hi oddsock,

    Besides the fact that you cannot specify the MAXRECURSION hint as part of the inner query (as William mentioned), the recursive solution is very slow compared to the nonrecursive one I showed. It took my solution 3 seconds to generate 10,000,000 numbers (with Discard results after execution turned on), while it took your recursive query 172 seconds to do the same.
    As for the complexity of the solution, it’s hidden from the user by encapsulating it in a function. As long as you trust that it works correctly, the use is simple.

    Cheers,
    Itzik]

You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...