• subscribe
January 20, 2009 12:00 AM

More Options for Handling Arrays as Inputs

Use a CLR split UDF or table-valued parameters
SQL Server Pro
InstantDoc ID #100975
Downloads
100975.zip

Executive Summary:
Use a Common Language Runtime (CLR) split user-defined function (UDF) or SQL Server 2008’s new table-valued parameters to handle arrays as inputs in T-SQL.

SQL Server DBAs need to know how to handle arrays as inputs. One solution is to use dynamic SQL, but as I explain in “Handling Arrays,” this solution has both security and performance problems. Another solution uses a T-SQL user-defined function (UDF) that splits an array into its elements. The function accepts a string with a comma-separated list of values as input and returns a table result with a row for each element. This solution is secure and more efficient than the dynamic SQL approach, but it’s also quite complicated, as I explain in “Handling Arrays as Inputs.”

 

I discuss two additional solutions here. The first approach implements a Common Language Runtime (CLR) UDF that splits an array into its elements. The second technique uses SQL Server 2008’s new table-valued parameters.

 

CLR Split UDF

SQL Server 2005 introduced CLR integration, which lets you develop routines with .NET languages. T-SQL is useful for certain types of tasks, especially tasks that involve data manipulation; however, T-SQL isn’t ideal for tasks such as splitting arrays that involve string manipulation and iterative logic. Procedural languages such as .NET languages are better than T-SQL at performing such tasks. .NET languages typically provide better performance, richer capabilities, and simpler implementation. (For more information about T-SQL vs. CLR, see “CLR or Not CLR: Is That the Question?”)

 

A CLR version of a UDF that splits a string with a comma-separated list of values to the individual elements is simple to implement because the string type supports a method called Split that already implements the split logic. Listing 1 uses C# to define such a UDF. (This listing is courtesy of Dejan Sarka and Steve Kass.) Listing 1’s function header sets the FillRowMethodName attribute to “ArrSplitFillRow.” ArrSplitFillRow is a method that is defined after the fn_SplitCLR function’s definition, and its purpose is to convert the input object to a string. The header also defines an attribute called TableDefinition with the schema of the output table; this attribute is required in the header only if you want to deploy the assembly in the database from Visual Studio as opposed to using manual deployment. The function invokes the built-in Split method of the string type to split the input array.

 

Before you deploy the function in the Northwind database, run the following code to ensure that any existing version of the fn_split UDF is dropped:

 

USE Northwind;
IF OBJECT_ID('dbo.fn_split', 'IF') IS NOT NULL DROP FUNCTION dbo.fn_split;

 

Next, follow the instructions in “5 Steps for Developing and Deploying CLR Code in SQL Server” to deploy the fn_split function from Listing 1 in the Northwind database. Name the project SplitCLR, and specify the C drive as the location for the project folder. Then run the code in Listing 2 to create the usp_getorders procedure. This procedure accepts a string with a comma-separated list of order IDs as input and returns information about the requested orders. It does so by joining the result of the fn_split function with the Orders table. Note that in case the input string contains multiple occurrences of an order ID, the procedure will return multiple occurrences of that order in the output. If you need to return unique occurrences in such a case, add the DISTINCT clause to the SELECT list, or rewrite the query to use the EXISTS or IN predicate.

 

Run the following code to test the stored procedure, executing it with two different arrays:

 

EXEC dbo.usp_getorders N'10248,10250,10249';
EXEC dbo.usp_getorders N'10260,10270,10265,10290';

 

Then query the syscacheobjects view to analyze plan reuse behavior:

 

SELECT objtype, usecounts, sql
FROM sys.syscacheobjects -- in 2000 query master.dbo.syscacheobjects
WHERE SQL LIKE '%89CE9E3A-943A-42F2-AEB6-0C924CB18387%'
  AND sql NOT LIKE '%sys%';

 

You’ll get the output in Table 1, which indicates that the plan was reused.



ARTICLE TOOLS

Comments
  • TIM
    3 years ago
    Mar 09, 2009

    Our top SP behind a retail website takes a delimited string that may contain dupes, which are filtered out. I modified the C# method from this article to utilize the efficiency of a HashTable to prevent duplicate element values.

    However! The performance of the CLR solution (mine or the article's) in production was very poor. The original SP uses a temp table inside a WHILE loop with SUBSTRING to parse each element. The existence of the value is checked before each insert to the temp table (though it may be cheaper to SELECT DISTINCT inside a subquery). This method keeps TEMPDB busy (but well under disk capacity), the CPUs averaging around 8%, the DB around 8K logical page reads/s and procedure cache hit rate near 97%. I tried several varieties of the CLR implementation (some keeping the temp table, others using the UDF inline), but ALL shot the CPUs to 20%+, logical page reads to 25K/s and the cache hit rate down to 82%.

    I'm sure the CLR is more efficient at parsing, but there seems to be a huge overhead when it's invoked several times/sec. Any similar experiences or suggestions?? This is on an 8-proc, 64-bit server.


    [SqlFunction(
    FillRowMethodName = "ArrSplitDFillRow",
    DataAccess = DataAccessKind.None,
    TableDefinition = "pos INT, element NVARCHAR(50)"
    )]
    public static IEnumerable fn_splitDistinct([SqlFacet(MaxSize = -1)]SqlString inpStr, SqlString charSeparator)
    {
    string locStr;
    string[] splitStr;
    char[] locSeparator = new char[1];
    locSeparator[0] = (char)charSeparator.Value[0];

    if (inpStr.IsNull)
    locStr = "";
    else
    locStr = inpStr.Value;

    splitStr = locStr.Split(locSeparator, StringSplitOptions.RemoveEmptyEntries);
    Hashtable SplitString = new Hashtable();

    int i = 1;
    foreach (string s in splitStr) {
    if (!SplitString.ContainsKey(s)) {
    SplitString.Add(s,i);
    ++i;
    }
    }
    return SplitString;
    }

    public static void ArrSplitDFillRow(Object obj, out int pos, out string item) {
    pos = (int)((DictionaryEntry)obj).Value;
    item = (string)((DictionaryEntry)obj).Key;

You must log on before posting a comment.

Are you a new visitor? Register Here