Using an auxiliary table of numbers (call it Nums) can be helpful in many T-SQL
querying problems. The table is very simple—it’s just one integer column (call it n) containing a sequence of integers from 1 and on (as many as you need). You can use
an auxiliary table of numbers to perform tasks such as splitting strings containing arrays
of elements, generating histograms, unpivoting data, generating copies, and generating
test data. Since this type of table can be so useful, you’d probably want to create one
in your database and use it in your queries. However, suppose that you aren’t allowed
to create new tables in your database. Or even if you are allowed, you’re looking for a
faster way to obtain a table of numbers than querying a real table. So here’s the challenge, which is the focus of this article: Write a table-valued function (call it fn_nums)
that accepts as an input parameter the number of rows you want to produce (call it @
max) and returns a table of integers in the range 1 through @max.
The function should return the result
as quickly as possible. Here’s an example of
how you’d query the function to return 10
numbers:
SELECT n FROM dbo.fn_nums(10)
AS Nums;
(Some of the code in the article and listings
wraps to multiple lines because of space
constraints.) This query should produce
the output that Table 1 shows. To test
performance, run the code with the input
10000000. Make sure that when you test
performance, you select the SQL Server
Management Studio (SSMS) Discard results
after execution check box—to do so, click
Tools, Options, Query Results, SQL Server,
Results to Grid (or Text). Doing so lets you
test server-processing time, excluding the
time it takes to generate the output. Try to
come up with an implementation that runs
at around 10 seconds or less for 10 million
rows. See if you can work out your own
solution before looking at the three solutions I provide. Good luck!
Solution 1: Simple Recursive
Query
My first solution is based on a simple recursive query. Listing 1 shows the solution code,
including the function’s definition. The code
defines a recursive common table expression
(CTE) called Nums. The anchor member
returns a single row with the value 1 in
column n. The recursive member returns a
row with the value n + 1 from the previous
row if n is smaller than the input parameter
@max; otherwise, it returns an empty set
(recursion termination check). In short, the recursive member stops as soon as it generates the
requested number of rows. The outer query returns the unified set combining the result set returned by the anchor member and all the result sets returned by the recursive member. The recursive member is invoked @max times (the last returning an empty set).
To test the function, run the
following query:
SELECT n FROM dbo.fn_nums(10)
AS Nums;
You’ll get the output that Table 1 shows.
Before you try it with 10 million rows,
though, I’d like to point out a logical
problem with this implementation. To prevent infinite invocations of a recursive
member in a recursive CTE, SQL Server
limits the number of invocations of the
recursive member to 100 by default. The
101st invocation would cause the query to
fail at runtime and generate an error. So if
you try running this code:
SELECT n FROM dbo.fn_nums(1000)
AS Nums;
You’ll get the error message Msg 530, Level
16, State 1, Line 1, The statement terminated.
The maximum recursion 100 has been exhausted
before statement completion. SQL Server lets
you change the default limit of 100 by
specifying a MAXRECURSION hint. Setting MAXRECURSION to 0 means no
limit. However, if you try incorporating the
MAXRECURSION hint in the query
within the function’s definition, as Listing 2 shows, you’ll get the following error: Msg 156,
Level 15, State 1, Procedure fn_nums, Line 10,
Incorrect syntax near the keyword 'OPTION'.
SQL Server doesn’t let you incorporate
query hints within a function’s definition.
You’re left with no choice but to specify the
hint in the outer query against the function,
like this:
SELECT n FROM dbo.fn_nums(1000)
AS Nums
OPTION (MAXRECURSION 0);
Of course, doing so is undesirable because you have to know
how the function is implemented (i.e., that it uses a recursive query) to realize that you
need to specify the hint in the
outer query, so you lose some
of the important features that
encapsulation is supposed to
give you. But that’s the reality.
To test the performance of
this implementation, first make
sure that you select the Discard
results after execution check box in SSMS. Then in a new query window, run
the following code:
SELECT n FROM
dbo.fn_nums(10000000) AS Nums
OPTION (MAXRECURSION 0);
In my test system, this query ran for 253
seconds. That’s very slow, not to mention
that you also have to specify the hint in the
outer query. The reason for the bad performance is mainly the overhead involved with
each individual invocation of the recursive member (e.g., spooling and indexing of the
intermediate sets). In short, this solution is
far from satisfactory, so let’s explore further
alternatives.
Prev. page  
[1]
2
next page