| Executive Summary: Learn how to use a T-SQL split function to handle arrays as inputs for Microsoft SQL Server stored procedures. |
Although SQL Server stored procedures don’t
support using arrays as inputs, DBAs often
need to run procedures that can accept an
unknown number of input parameters. For example,
suppose you need to develop a stored procedure in
the Northwind database that accepts a list of order
IDs as input and returns all orders from the Orders
table whose keys appear in the input list. A common
solution is to use dynamic SQL; in “Handling Arrays”
(InstantDoc ID 100453), I explain
why this solution isn’t preferable.
Another option is to create a table function that
accepts a string with a separated list of elements as input
(@arr), and possibly a second argument with the separator
character (@sep), and returns a table result with a
row for each element. You can then use such a split function
in your stored procedure and join its output table
with the Orders table to identify the qualifying orders.
In this article I discuss how to use a T-SQL implementation
of the split function solution. Next month
I’ll discuss a Common Language Runtime (CLR)
implementation of the split function approach.
Getting Started
To use the T-SQL split function approach,
first run the code in Listing 1 to create and
populate an auxiliary table of numbers.
Next, run the code in Listing 2 to create
the T-SQL implementation of the
fn_split function. The split logic
implemented by the function’s
query can be divided into three
parts: generating copies, extracting
elements, and calculating position.
Generating copies. The first part of the
query produces a copy of the @arr value for
each element in the array. This action requires
the Nums table that the script in Listing 1
created; as you can see, the query in
Listing 2 is against the Nums table.
The predicate in the WHERE clause
is in charge of producing a match for
each element in the array. In simple
terms, a match is found for each
number n from the Nums table that
represents the character position of
@sep (the list’s separator) in @arr.
For x number of elements, there are
x - 1 separators. In order to obtain
x matches for x elements, you must add a separator in front of the first element. That is,
you need to look for @sep in @sep + @arr instead
of in @arr. In addition, because the Nums table
can have many more numbers than the number
of characters in @arr, you filter only the numbers
that are smaller than or equal to the length of
@arr plus one for the separator you added in front
of the first element. Here’s the complete predicate
in the query’s WHERE clause:
WHERE n <= LEN(@arr) + 1
AND SUBSTRING(@sep + @arr, n, 1) = @sep
This filter gives you a match for each element in the
array; within that match, n represents the position of
@sep in @sep + @arr. Interestingly, in respect to @arr,
n simply represents the position of the character where
the element starts. Next, you need to extract a different
element from each copy of @arr.
Extracting elements. The second part of the script
in Listing 2 extracts elements from each copy of @arr.
The column n in the Nums table represents the character
position where the element of interest starts. Thus, a call
to the SUBSTRING function extracts the element:
SUBSTRING(@arr, n, CHARINDEX(@sep, @arr + @sep,
n) - n)
The SUBSTRING function operates on @arr as the
input string and extracts a substring starting at character
position n. The length to extract is the position
of the next occurrence of @sep in @arr (calculated by
the CHARINDEX function) minus n. Two things to
note about the use of CHARINDEX to calculate the
position of the next occurrence of @sep in @arr are
that (1) in order to ensure that the separator is also
found after the last element, the function looks for
@sep in @arr + @sep rather than just in @arr and
(2) the last argument to the function tells the function
where to start looking for @sep. In my example, you
would tell CHARINDEX to look for the next occurrence
of @sep starting at the beginning of the current
element, which is represented by n; hence n is provided
as the last argument to the function.
Calculating position. The final portion of the script
in Listing 2 calculates the position of the element
within the array. In SQL Server 2000 the solution is
quite tricky. The position of the element starting at
character n can be calculated by counting the number
of occurrences of @sep prior to the nth character (in
the left n - 1 characters) plus 1. To count the number
of occurrences of a character c in a string s, use the
REPLACE function to remove all occurrences of c
from s (call the result s2) by substituting them with
an empty string, then simply subtract the length of
s2 from s. The length of the original string minus the
length of the string after removal of all occurrences
of the separator character gives you the number of
occurrences of the separator character. The original
length of our string of interest is by definition (n - 1),
therefore the complete expression that calculates the
number of occurrences of @sep in the left n - 1 characters
in @arr is:
(n - 1) - LEN(REPLACE(LEFT(@arr, n-1), @sep, N'))
By adding 1 to the expression above, you get the position
of the element within the array. Note that if you’re
working with SQL Server 2005 or later, you can calculate
the position of the element with a simple expression
based on the ROW_NUMBER function:
ROW_NUMBER() OVER(ORDER BY n)
Also, remember that as of SQL Server 2005 you can
work with VARCHAR(MAX) or NVARCHAR(MAX)
as the input parameters to the function and stored
procedure, and thus support very long input strings
(up to 2GB).
Now that you understand the logic behind this
solution, try running the following code to test the
function:
SELECT * FROM dbo.fn_split(N'10248,10250,10249',
N',');
You’ll get the output in Table 1, which includes a row for
each element and indicates its position in the array.
Continue on Page 2