December 23, 2008 10:23 PM

Handling Arrays as Inputs

Use a T-SQL split function
Rating: (0)
SQL Server Magazine
InstantDoc ID #100657
Executive Summary: Learn how to use a T-SQL split function to handle arrays as inputs for Microsoft SQL Server stored procedures.
Although SQL Server stored procedures don’t support using arrays as inputs, DBAs often need to run procedures that can accept an unknown number of input parameters. For example, suppose you need to develop a stored procedure in the Northwind database that acc...

...This article is for paid Professional Members only.

Already a Professional Member? Please log in now:

NOT A PROFESSIONAL MEMBER? YOU CHOOSE:

Professional Membership

Monthly

Annual

VIP Membership

Monthly

Annual

Add a Comment

When I insert result of fn_split function in to the temp variable and then join that temp variable with my other two tables I get nested loop join which is optimal for my query.

Ognjen

[Reply: Makes sense. :)

Cheers,
Itzik]

Ognjen12/4/2009 7:49:10 AM


Itzik,
Artical is great but I have one question.
I get wrong estimation of rows when I use fn_split function and because of that I have suboptimal query plan. Fn_split function in my case returns only two rows but estimated number of rows is 1100000 and becaouse of that I get hash join instead of nested loop with two other tables. Do you have some tip how to get a good estimation of rows.

Thanks,
Ognjen

[Reply: An inaccurate estimate could be a result of reuse of a previously cached plan. The estimate is based on compile time. Could also be a matter of referring to a variable who's value cannot be sniffed. In both cases, adding OPTION(RECOMPILE) to the query could help. If that’s not the case and it’s just a matter of inaccurate estimates there’s the option to first insert the result of the split into a temp table and then joining the temp table with the other. This will add some cost to the process due to the use of the temp table, but the estimates should be better, so overall the process could improve.

Cheers,
Itzik]


Ognjen12/4/2009 3:39:10 AM


-- try that again; hopefully my returns will persist this time;

WITH splitterCte (pos, idx, element) AS (

SELECT CAST(1 AS INT) pos
,CHARINDEX(@sep, @arr+@sep, 1) idx
,SUBSTRING(@arr, 0, CHARINDEX(@sep, @arr+@sep, 1)) element

UNION ALL

SELECT pos+1 pos
,CHARINDEX(@sep, @arr+@sep, idx+1) idx
,SUBSTRING(@arr, idx+1, CHARINDEX(@sep, @arr+@sep, (idx+1))-(idx+1)) element
FROM splitterCte
WHERE CHARINDEX(@sep, @arr+@sep, idx+1)>0
)
SELECT pos, element
FROM splitterCte OPTION (MAXRECURSION 32767);

Steve1/9/2009 12:52:17 PM


Thank you for the article, enjoyable as always.

This is a road I, like so many others, been down many times in the past. Using the Nums table is a clever technique for achieving a set based solution that is compatible with SQL8.

I can understand why reader danoland may be hesitant to create a permanent Nums table, but I also believe that he doesn’t fully comprehend how it is used. The Nums table is used as something of an anchor, something to select against when discovering the starting positions of the array elements; it’s the key to the set based solution. So, it doesn’t matter what the data type of the array is as long as the Nums table is sequential and the string length of the ‘array’ passed is not greater than than MAX(Nums.n).

I’ve found that if backwards compatibility is not an issue a recursive CTE solution runs significantly faster. One notable caveat is that due to MAXRECURSION the array length is limited to 32,767 items.

-----------------------------------------------
WITH splitterCte (pos, idx, element) AS (

SELECT CAST(1 AS INT) pos
,CHARINDEX(@sep, @arr+@sep, 1) idx
,SUBSTRING(@arr, 0, CHARINDEX(@sep, @arr+@sep, 1)) element

UNION ALL

SELECT pos+1 pos
,CHARINDEX(@sep, @arr+@sep, idx+1) idx
,SUBSTRING(@arr, idx+1, CHARINDEX(@sep, @arr+@sep, (idx+1))-(idx+1)) element
FROM splitterCte
WHERE CHARINDEX(@sep, @arr+@sep, idx+1)>0
)
SELECT pos, element
FROM splitterCte OPTION (MAXRECURSION 32767);
-----------------------------------------------

--Steve--

[Reply: Thanks for the input, Steve. BTW, my friend and fellow MVP Erland Sommarskog did very thorough benchmarks of the various methods and you can find his results here: http://www.sommarskog.se/arrays-in-sql.html. You can see the numbers for the CTE option vs. the numbers table among others.

Cheers,
Itzik]

Steve1/9/2009 11:56:39 AM


Thank you for this article - it is very useful for something I am working on right now.

I don't like having dbo.nums as a separate table. It seems that I would also need a separate tables for strings and dates. Furthermore, I think I would need to make sure that the table is empty before processing.

I think I would like to create a temporary table or a table variable within the stored procedure that is calling the split function. It seems that the functionality of dbo.nums is temporary and ephemeral, exactly the sort of thing that temporary tables and table variables were designed for. However, I don't understand the performance considerations for such an implementation. I would think that if I put it in a stored procedure, then the query plan would be cached, and the performance wouldn't be too bad.

[Reply: There are many practical uses for a numbers table, including generating a sequence of date and time values. I would strongly recommend having a permanent one in the database.

Cheers,
Itzik]

Dan12/31/2008 9:04:21 AM


You must log on before posting a comment.

Are you a new visitor? Register Here
GOOGLE LINKS
SPONSORED LINKS
FEATURED LINKS