• subscribe
December 18, 2006 12:00 AM

Auxiliary Table-of-Numbers Tricks

Use a table-valued function to generate a virtual table of integers
SQL Server Pro
InstantDoc ID #94376
Downloads
94376.zip

Using an auxiliary table of numbers (call it Nums) can be helpful in many T-SQL querying problems. The table is very simple—it’s just one integer column (call it n) containing a sequence of integers from 1 and on (as many as you need). You can use an auxiliary table of numbers to perform tasks such as splitting strings containing arrays of elements, generating histograms, unpivoting data, generating copies, and generating test data. Since this type of table can be so useful, you’d probably want to create one in your database and use it in your queries. However, suppose that you aren’t allowed to create new tables in your database. Or even if you are allowed, you’re looking for a faster way to obtain a table of numbers than querying a real table. So here’s the challenge, which is the focus of this article: Write a table-valued function (call it fn_nums) that accepts as an input parameter the number of rows you want to produce (call it @ max) and returns a table of integers in the range 1 through @max.

The function should return the result as quickly as possible. Here’s an example of how you’d query the function to return 10 numbers:

SELECT n FROM dbo.fn_nums(10)
AS Nums;

(Some of the code in the article and listings wraps to multiple lines because of space constraints.) This query should produce the output that Table 1 shows. To test performance, run the code with the input 10000000. Make sure that when you test performance, you select the SQL Server Management Studio (SSMS) Discard results after execution check box—to do so, click Tools, Options, Query Results, SQL Server, Results to Grid (or Text). Doing so lets you test server-processing time, excluding the time it takes to generate the output. Try to come up with an implementation that runs at around 10 seconds or less for 10 million rows. See if you can work out your own solution before looking at the three solutions I provide. Good luck!

Solution 1: Simple Recursive Query
My first solution is based on a simple recursive query. Listing 1 shows the solution code, including the function’s definition. The code defines a recursive common table expression (CTE) called Nums. The anchor member returns a single row with the value 1 in column n. The recursive member returns a row with the value n + 1 from the previous row if n is smaller than the input parameter @max; otherwise, it returns an empty set (recursion termination check). In short, the recursive member stops as soon as it generates the requested number of rows. The outer query returns the unified set combining the result set returned by the anchor member and all the result sets returned by the recursive member. The recursive member is invoked @max times (the last returning an empty set).

To test the function, run the following query:

SELECT n FROM dbo.fn_nums(10)
 AS Nums;

You’ll get the output that Table 1 shows. Before you try it with 10 million rows, though, I’d like to point out a logical problem with this implementation. To prevent infinite invocations of a recursive member in a recursive CTE, SQL Server limits the number of invocations of the recursive member to 100 by default. The 101st invocation would cause the query to fail at runtime and generate an error. So if you try running this code:

SELECT n FROM dbo.fn_nums(1000)
AS Nums;

You’ll get the error message Msg 530, Level 16, State 1, Line 1, The statement terminated. The maximum recursion 100 has been exhausted before statement completion. SQL Server lets you change the default limit of 100 by specifying a MAXRECURSION hint. Setting MAXRECURSION to 0 means no limit. However, if you try incorporating the MAXRECURSION hint in the query within the function’s definition, as Listing 2 shows, you’ll get the following error: Msg 156, Level 15, State 1, Procedure fn_nums, Line 10, Incorrect syntax near the keyword 'OPTION'.

SQL Server doesn’t let you incorporate query hints within a function’s definition. You’re left with no choice but to specify the hint in the outer query against the function, like this:

SELECT n FROM dbo.fn_nums(1000)
  AS Nums 
OPTION (MAXRECURSION 0);

Of course, doing so is undesirable because you have to know how the function is implemented (i.e., that it uses a recursive query) to realize that you need to specify the hint in the outer query, so you lose some of the important features that encapsulation is supposed to give you. But that’s the reality.

To test the performance of this implementation, first make sure that you select the Discard results after execution check box in SSMS. Then in a new query window, run the following code:

SELECT n FROM
   dbo.fn_nums(10000000) AS Nums
OPTION (MAXRECURSION 0);

In my test system, this query ran for 253 seconds. That’s very slow, not to mention that you also have to specify the hint in the outer query. The reason for the bad performance is mainly the overhead involved with each individual invocation of the recursive member (e.g., spooling and indexing of the intermediate sets). In short, this solution is far from satisfactory, so let’s explore further alternatives.



ARTICLE TOOLS

Comments
  • Diana
    5 years ago
    Mar 06, 2007

    Hello - here is Itzik's correction. We'll get the file fixed.


    The error was: first FROM clause referred to Nums and should refer to SQR.

    Regards,
    Itzik

  • Mark
    5 years ago
    Feb 23, 2007

    I believe there is an error in code listing #3, "Nums" should be "SQR".

    [Itzik: you're right; I'll ask for a fix. Thanks!]

You must log on before posting a comment.

Are you a new visitor? Register Here