• subscribe
December 23, 2008 12:00 AM

Handling Arrays as Inputs

Use a T-SQL split function
SQL Server Pro
InstantDoc ID #100657
Downloads
100657.zip

Run the code in Listing 3 to create the usp_getorders procedure, which is based on a call to the function fn_split, as well as a join between the function and the Orders table to return only the qualifying orders. Note the use of a new GUID to inspect the plan caching and reuse behavior of the new solution.

Next, run the following code to test the stored procedure with two different inputs:

EXEC dbo.usp_getorders N'10248,10250,10249';
EXEC dbo.usp_getorders   
N'10260,10270,10265,10290';

Because the query strings for both invocations are the same, the plan produced for the first invocation of the query can be reused for subsequent invocations. You can confirm this by inspecting the plans associated with the query in cache:

SELECT objtype, usecounts, sql
FROM sys.syscacheobjects -- in 2000 query master
   .dbo.syscacheobjects
WHERE SQL LIKE '%89CE9E3A-943A-42F2-AEB6-
    0C924CB18387%'
   AND sql NOT LIKE '%sys%';

Table 2 shows the output of this query.

Note that there’s only one plan in cache, and the usecounts attribute indicates that it was used twice. This implementation has great performance benefits over the dynamic SQL approach that I discussed in “Handling Arrays” (InstantDoc ID 100453). This method optimizes the query only once, then reuses the cached plan. Also, because there’s only one plan in cache, the procedure creates no memory overhead. In addition, because this implementation of the procedure uses a static query, there’s no exposure to SQL injection, and the solution is therefore secure. Finally, because the fn_split function returns the position of the element in the array (pos attribute), you can sort the rows in the output by that position by adding ORDER BY pos to the query.

Choices, Choices
Using a T-SQL split function to handle arrays as inputs is a good solution—the approach is secure and doesn’t negatively affect performance. However, the implementation is unnecessarily complex. Because you should try to develop solutions that are as simple and straightforward as possible, you should consider additional alternatives. Next month I’ll present an approach that uses a CLR version of the split function, as well as an implementation based on SQL Server 2008’s table valued parameters.



ARTICLE TOOLS

Comments
  • Ognjen
    3 years ago
    Dec 04, 2009

    When I insert result of fn_split function in to the temp variable and then join that temp variable with my other two tables I get nested loop join which is optimal for my query.

    Ognjen

    [Reply: Makes sense. :)

    Cheers,
    Itzik]

  • Ognjen
    3 years ago
    Dec 04, 2009

    Itzik,
    Artical is great but I have one question.
    I get wrong estimation of rows when I use fn_split function and because of that I have suboptimal query plan. Fn_split function in my case returns only two rows but estimated number of rows is 1100000 and becaouse of that I get hash join instead of nested loop with two other tables. Do you have some tip how to get a good estimation of rows.

    Thanks,
    Ognjen

    [Reply: An inaccurate estimate could be a result of reuse of a previously cached plan. The estimate is based on compile time. Could also be a matter of referring to a variable who's value cannot be sniffed. In both cases, adding OPTION(RECOMPILE) to the query could help. If that’s not the case and it’s just a matter of inaccurate estimates there’s the option to first insert the result of the split into a temp table and then joining the temp table with the other. This will add some cost to the process due to the use of the temp table, but the estimates should be better, so overall the process could improve.

    Cheers,
    Itzik]

  • Steve
    3 years ago
    Jan 09, 2009

    -- try that again; hopefully my returns will persist this time;

    WITH splitterCte (pos, idx, element) AS (

    SELECT CAST(1 AS INT) pos
    ,CHARINDEX(@sep, @arr+@sep, 1) idx
    ,SUBSTRING(@arr, 0, CHARINDEX(@sep, @arr+@sep, 1)) element

    UNION ALL

    SELECT pos+1 pos
    ,CHARINDEX(@sep, @arr+@sep, idx+1) idx
    ,SUBSTRING(@arr, idx+1, CHARINDEX(@sep, @arr+@sep, (idx+1))-(idx+1)) element
    FROM splitterCte
    WHERE CHARINDEX(@sep, @arr+@sep, idx+1)>0
    )
    SELECT pos, element
    FROM splitterCte OPTION (MAXRECURSION 32767);

  • Steve
    3 years ago
    Jan 09, 2009

    Thank you for the article, enjoyable as always.

    This is a road I, like so many others, been down many times in the past. Using the Nums table is a clever technique for achieving a set based solution that is compatible with SQL8.

    I can understand why reader danoland may be hesitant to create a permanent Nums table, but I also believe that he doesn’t fully comprehend how it is used. The Nums table is used as something of an anchor, something to select against when discovering the starting positions of the array elements; it’s the key to the set based solution. So, it doesn’t matter what the data type of the array is as long as the Nums table is sequential and the string length of the ‘array’ passed is not greater than than MAX(Nums.n).

    I’ve found that if backwards compatibility is not an issue a recursive CTE solution runs significantly faster. One notable caveat is that due to MAXRECURSION the array length is limited to 32,767 items.

    -----------------------------------------------
    WITH splitterCte (pos, idx, element) AS (

    SELECT CAST(1 AS INT) pos
    ,CHARINDEX(@sep, @arr+@sep, 1) idx
    ,SUBSTRING(@arr, 0, CHARINDEX(@sep, @arr+@sep, 1)) element

    UNION ALL

    SELECT pos+1 pos
    ,CHARINDEX(@sep, @arr+@sep, idx+1) idx
    ,SUBSTRING(@arr, idx+1, CHARINDEX(@sep, @arr+@sep, (idx+1))-(idx+1)) element
    FROM splitterCte
    WHERE CHARINDEX(@sep, @arr+@sep, idx+1)>0
    )
    SELECT pos, element
    FROM splitterCte OPTION (MAXRECURSION 32767);
    -----------------------------------------------

    --Steve--

    [Reply: Thanks for the input, Steve. BTW, my friend and fellow MVP Erland Sommarskog did very thorough benchmarks of the various methods and you can find his results here: http://www.sommarskog.se/arrays-in-sql.html. You can see the numbers for the CTE option vs. the numbers table among others.

    Cheers,
    Itzik]

  • Dan
    4 years ago
    Dec 31, 2008

    Thank you for this article - it is very useful for something I am working on right now.

    I don't like having dbo.nums as a separate table. It seems that I would also need a separate tables for strings and dates. Furthermore, I think I would need to make sure that the table is empty before processing.

    I think I would like to create a temporary table or a table variable within the stored procedure that is calling the split function. It seems that the functionality of dbo.nums is temporary and ephemeral, exactly the sort of thing that temporary tables and table variables were designed for. However, I don't understand the performance considerations for such an implementation. I would think that if I put it in a stored procedure, then the query plan would be cached, and the performance wouldn't be too bad.

    [Reply: There are many practical uses for a numbers table, including generating a sequence of date and time values. I would strongly recommend having a permanent one in the database.

    Cheers,
    Itzik]

You must log on before posting a comment.

Are you a new visitor? Register Here