• subscribe
February 19, 2009 12:00 AM

Build the Numbers Table You Need

A custom tool that skirts the limitations that traditional techniques entail
SQL Server Pro
InstantDoc ID #100531
Downloads
100531.zip

To make the numbers work the way we need, round n up to the next integer if it's not an exact integer, and then add 1. The T-SQL statement at callout C does all this; we multiply @end_int by 1.0 because the math works reliably with real numbers. With this expression, for @end_int = 5000000, @int_row_groups = 8. The earlier number shift is important here. If @start_int is less than 1, the natural log function at best would have returned the wrong value and at worst would have crashed. Although this statement returns 0 for @end_int less than 32,768, @end_int will never drop below 32,768 at this point in the function, so we're safe.

The function is almost ready to make the row insertions from integers 32,767 to @end_int. At callout D, it initializes variable @current_row_group to 1 and uses this variable as the loop counter for the WHILE loop. Variable @num_row_groups has the number of row groups the function handles. The WHILE loop loops once for every row group. Before the loop starts its work, variable @last_int has the last, or highest, value already inserted into table variable @IntegersTable. With @last_int_inserted, the loop knows the integer at which it should start inserting rows. Row group 1 is a special case, and @last_int_inserted helps there, too. Row group 1 covers rows 1 to 65,534, but the IntegersTableFill CTE already inserted at most 32,767 rows. Variable @last_int_inserted equals 32,767, so in the loop, the first row group should start at integer 32,768. Variable @last_int_inserted makes this possible. If the loop is at the last row group, it uses parameter @end_int to stop. Otherwise, it just keeps going. Setting @end_int this way makes the function more efficient, as I mentioned earlier. For example, if the user picked 16,776,706 for @end_int, the function would end at row group 10. But if the SELECT statement didn't stop at @end_int, the function would insert integers16,776,70 to 33,553,408 and throw them away later. Variable @end_int prevents this waste.

Now the function uses the INSERT INTO statement (callout E) to insert the integers. Earlier, I mentioned that the INSERT INTO statement doubles the number of integers already in @IntegersTable. Look closely at the SELECT statement to see why. As a separate SELECT, it adds @last_int_inserted (the latest, largest integer in @IntegersTable) and possibly @step_int, to every integer already in @IntegersTable. Then the INSERT INTO statement inserts this new row set into @IntegersTable. Thus, for each pass of the WHILE loop, the largest integer inserted is (@last_int_inserted + @last_int_inserted) plus maybe @step_int, doubling the number of integers in @IntegersTable. Only in the last row group does the last result set integer exceed @end_int. The WHERE clause tests for this condition, optimizing the INSERT statement. The SELECT statement uses @step_int almost exactly the way the IntegersTableFill CTE used it, except that here, it subtracts 1 to account for the fact that it operates on a "base" set of integers that starts at 1, not 0. Finally, the loop increments @current_row_group.

When the loop ends, table variable @IntegersTable has a "raw" list of integers, but input parameter @start_int might differ from 1, which is the value the function assumed, and the function needs to compensate. First, if @start_int is less than 1, the UPDATE statement (callout F) shifts the integers back to that @start_int value. The function recalculates @end_int, and the DELETE statement trims @IntegersTable accordingly. The function inserts everything into @FinishedIntegersTable, which it returns to the T-SQL statement that originally called it, and then it ends.

If @step_int is greater than 1 and @end_int is greater than 32,767, the last integer the CTE inserts could have a value greater than 32,767 and the largest integers in the row groups would exceed the integers in the row groups that Figure 2 shows. If CreateIntegersTable handled this scenario, it might become even more efficient, but it would also become more complex.

Put It to Work

Listing 2 shows how to use CreateIntegersTable. Declare a table variable and run an INSERT INTO statement with a SELECT statement that calls the function and specifies parameters. That's it.

Unfortunately, CreateIntegersTable adds entries to the tempdb log file, and for a very large integer list, this could become a big problem because the log file could use a lot of system resources and affect performance. SQL Server 2005 has no way to prevent this situation; if it did, the function would have even better performance and use even fewer resources. I researched for a solution but found nothing. If you know how to prevent this problem in either SQL Server 2008 or 2005, please send me an email to tell me about it.

Function CreateIntegersTable returns a set of rows that starts at the value of parameter @start_int—each time, every time. However, given specific @start_int, @step_int, and @end_int parameters, the final integer in the row set that CreateIntegersTable returns might differ from the @end_int parameter. This situation would happen because of mod function (i.e., division-remainder) math and could occur when the @step_int parameter is greater than 1. You can use scalar value function ActualEndIntegerCalculator, which Listing 3 shows, to find the actual, largest integer that CreateIntegersTable will return given a specific set of start, step, and end parameters. This function can go in any database, and this example shows how to use it:

SELECT  dbo.ActualEndIntegerCalculator (1455, 45, 22401)

If parameter @start_int is less than or equal to 0, ActualEndIntegerCalculator starts at @start_int and finds the multiple of @step_int closest to @end_int. If @start_int is greater than 0, the calculator function shifts the @start_int and @end_int parameters so that @start_int is 0, calculates @end_int based on this shift, as before, and then adds back @start_int. If @start_int is greater than @end_int, the function returns NULL.

With the CreateIntegersTable and ActualEndIntegerCalculator functions, you have efficient, flexible, high-performance tools to build the numbers tables you want, when you want.



ARTICLE TOOLS

Comments
  • Megan
    3 years ago
    Jun 29, 2009

    Hi JC,

    In response to your comment, Frank Solomon has provided a SQL Server 2000-compatible version of the code presented in this article. To access this code, click the 100531.zip link at the top of the article page under Download the Code.

    Please let me know if you have any questions.

    Thanks!

    Megan Keller

    Associate Editor, SQL Server Magazine

    mkeller@sqlmag.com

  • JC
    3 years ago
    Mar 07, 2009

    Hi. Great idea. At the risk of asking for something for nothing, I was wondering if you could post any suggestions on doing this in a SQL2k compatible version. Our office has -still- not upgraded and I am a bit confused on how to implement this without a CTE.

    TIA,

    ---JC

  • JEFF
    3 years ago
    Mar 01, 2009

    Although I applaud the author for a well written article, there are many ways to effectively generate a list numbers... and recursion isn't one of them.

  • Michael
    3 years ago
    Feb 25, 2009

    I see it's fixed!

  • Anne
    3 years ago
    Feb 19, 2009

    Uh oh, I don't know what happened to the article, but I'll get the problem fixed as soon as possible. Anne Grubb, web editor

You must log on before posting a comment.

Are you a new visitor? Register Here