Executive Summary:
CreateIntegersTable is a multistatement table-valued function written for Microsoft SQL Server 2005. You can use this custom tool to create a numbers table that works around nearly all the limitations that traditional techniques for building these tables usually involve. For example, traditional techniques might themselves involve cursors, loops, or identity columns. The techniques might need temporary tables or actual database tables, which could place expensive demands on database server resources, and the techniques usually lack the flexibility for a developer who needs a different kind of numbers table: one that has starting—and maybe ending—negative values or increment values greater than 1. CreateIntegersTable skirts nearly all these drawbacks.
|
In the database world, a numbers table, or tally table, is simply a table of unique integers. These one-column tables usually start with a value of 1, increment by 1, and end at some fixed integer. Developers can often use them to eliminate cursors and loops, parse strings, optimize queries, identify number sequence gaps, and more. A Google search for "tally table" or "numbers table" shows many different ways to build them, but the techniques generally have drawbacks. The techniques might themselves involve cursors, loops, or identity columns; cursors and loops usually run slowly, and identity columns can become a hassle. The techniques might need temporary tables or actual database tables, which could place expensive demands on database server resources, and the techniques lack the flexibility for a developer who needs a different kind of numbers table: one that has starting—and maybe ending—negative values or increment values greater than 1.
To get around almost all these drawbacks, I wrote CreateIntegersTable, the multistatement table-valued function shown in Listing 1. Written for SQL Server 2005, CreateIntegersTable takes input parameters @start_int, @step_int, and @end_int and returns a single-column table variable of type BIGINT. CreateIntegersTable relies on a common table expression (CTE) and simple table inserts. Although it uses one WHILE loop, the loop is executed infrequently—a 60-million–row result set, for example, loops at most 11 times. CreateIntegersTable avoids cursors, temporary tables, database tables, and identity columns. It handles both positive and negative start and end values as well as step values greater than 1. It can even make full use of the BIGINT space, both in the data type of the numbers themselves and in the total number of rows returned. Almost all the variable names in function CreateIntegersTable and ActualEndIntegerCalculator, a related function explained later, include "int," to clarify that these functions deal only with integers and big integers.
Understand It
Although CreateIntegersTable starts with USE [master], the function can go in any database. The script drops the function if it already exists in the target database, then creates a fresh instance of the function. The @start_int and @step_int parameters default to 1.
To simplify the code, the function assumes that the first integer in the integer range is at least 1. The function handles @start_int values greater than 1 later on but a @start_int value less than 1 needs special handling. When the start value is less than 1, the IF test adds ABS(@start_int) to the @end_int value. Because the function builds ranges starting at 1, not 0, it also adds 1 to @end_int to compensate by shifting the end value, as Figure 1 shows.

The IntegersTableFill CTE (callout A) inserts rows into @IntegersTable, the table variable that holds the generated integers, up to a maximum @end_int value of 32,767. Variable @end_int specifies the maximum value in the table; because the function has a default @step_int value of 1, the @end_int value of 32,767 applies to both the maximum value in the table and the number of rows in the table. If the function receives a @step_int value greater than 1, however, the number of rows in the table at this point will be less than 32,767 while the maximum value in the table remains 32,767. Using a CTE is fast, and, for this initial set of rows it has high performance—and the maximum number of rows involved, 32,767, makes the function's other features possible. Parameter @step_int increments the integers correctly. If @end_int is less than 32,767, the CTE WHERE clause uses @end_int instead of 32,767. This behavior improves efficiency, as I explain later.
The CTE OPTION (MAXRECURSION 32767) limit forces an @end_int maximum of 32,767. The MAXRECURSION option allows only 32,767 levels of recursion, which would mean a maximum of 32,767 integers. We could specify (MAXRECURSION 0) or even leave this line out of the function, but then nothing would protect the function from infinite recursion, which would crash it.
The next IF test (callout B) deals with @end_int values that exceed 32,767. Although the IntegersTable CTE that I just described works for values less than or equal to 32,767, numbers larger than 32,767 require something else because of the MAXRECURSION limit. For such end values, the function uses an INSERT INTO with a SELECT (callout E) that doubles the number of integers already in @IntegersTable. In other words, this particular INSERT INTO can handle row insertions only in successive groups of 2n power (2 ^ n) multiples of 65,534 (64KB) rows. Figure 2 shows these calculations.

The function inserts as many integers as we want, up to the BIGINT limit, but starting at integer 32,768, it must do so in steps, using a SELECT statement to double the number of rows already in @IntegersTable before it inserts new rows. Therefore, the function must figure out how many row groups it should handle. The SELECT statement in callout E does just that. Say that @end_int is 5,000,000. From Figure 2, the function should then go up to row group 8, the group that includes row 5,000,000 and ends at row 8,388,352. Based on the way the math works, which I described earlier, using the last number of group 8 as an example, we start with the equation
65534 x 2n = 8388352
and try to get a formula for n, the number of row groups. With natural logs and some algebra, we get
n = (ln(128))/(ln(2))
which yields n = 7. In Figure 2, the row group numbers start at 1, not 0, so add 1 to n, and you get n = 8. A general formula for n would look like this:
n = (ln(x/65534))/(ln(2))