• subscribe
November 24, 2008 12:00 AM

Handling Arrays

Using arrays as input for stored procedures
SQL Server Pro
InstantDoc ID #100453
Downloads
100453.zip

Try injecting code through the revised procedure by running the following command:

EXEC dbo.usp_getorders N'0); RAISERROR( 'This
  could have been much worse than a RAISEERROR
  statement.', 16, 1); --';

This time you get the following error:

Msg 50000, Level 16, State 1, Procedure usp_
  getorders, Line 11
Input may contain SQL injection. Procedure
  aborted.

However, if the input isn’t restricted to digits and commas (e.g., when the elements in the array are supposed to be character strings), you’ll find it difficult to guarantee a validation that leaves absolutely no exposure to SQL injection. (For more information, see SQL Server Books OnlineBOL—“SQL Injection” at msdn.microsoft.com/en-us/library/ms161953.aspx.)

Performance. When a query has a plan in cache, in order for subsequent query invocations to be able to reuse the cached plan, the query strings (that of the query for which the plan was cached and those of the subsequent query invocations) need to be the same. The exception is when SQL Server auto parameterizes the query, but this exception doesn’t apply to our procedure. For each unique input to our procedure, a different query string is constructed, and therefore a new plan is produced and cached. Instead of optimizing the query only once and reusing the cached plan in all subsequent invocations of the procedure, SQL Server will reoptimize almost every time the procedure is invoked, thus generating a large number of execution plans. To demonstrate plan caching and reuse behavior, I’ll use the version of the procedure that you create by running the code in Listing 3.

As you can see in the procedure’s code, I incorporated a GUID as a block comment within the query. Based on this GUID, it will be easy to locate the plans specifically associated with the query of interest in cache. After creating the procedure with the GUID as a comment, run the following code to execute the procedure with two different inputs:

EXEC dbo.usp_getorders N'10248,10250,10249';
EXEC dbo.usp_getorders
  N'10260,10270,10265,10290';

You can locate the plans associated with our procedure and dynamic batch by querying sys.syscache objects (or master.dbo.syscacheobjects, in versions before SQL Server 2005). This object has a row for each plan in cache. You can look for plans in which the SQL text (sql attribute) contains the specific GUID you planted as a comment. Assuming you executed the stored procedure twice with different arguments as I suggested, run the following code to inspect the plans associated with our query in cache:

SELECT objtype, usecounts, sql
FROM sys.syscacheobjects -- in 2000 query master.
  dbo.syscacheobjects
WHERE SQL LIKE '%DDAF1499-A7FF-4A9C-AE83-
  FCE5A33F5151%'
 AND sql NOT LIKE '%sys%';

For each plan in cache, the query returns the object type (Proc for procedure plan, Prepared or Adhoc for a prepared statement such as one prepared by sp_executesql), a count of how many times the plan was used, and the SQL text for which the plan was created. This query returns the output in Table 2. Note that in SQL Server 2008 the objtype attribute will show Adhoc instead of Prepared.

Dynamic SQL operates in its own batch and isn’t considered part of the batch of the calling procedure; therefore, you get separate plans for the procedure (Proc object type) and the dynamic batches (Prepared or Adhoc object type). The procedure plan is just a shell in our case, simply encapsulating a call to the dynamic batch. As you can see, SQL Server did reuse the procedure plan for the second invocation of the procedure. However, you can see that there’s a separate prepared statement plan for each unique dynamic batch, meaning that the actual query plan wasn’t reused. Think of the implications for performance when the procedure is invoked very frequently with different sets of order IDs; consider the cost of all those compilations, and the memory required to hold all the plans.

Note that as of SQL Server 2005, recompiles are done at the statement level rather than the batch/ procedure level; this method enables a RECOMPILE query hint (in addition to the existing RECOMPILE procedure option). When the RECOMPILE query hint is specified, SQL Server doesn’t bother to keep the plan in cache. Because with the dynamic SQL solution chances for plan reuse are very low to begin with, at least you can mitigate the memory pressure issue by adding OPTION(RECOMPILE) at the end of the query. Run the code in Listing 4 to create the revised stored procedure (notice the change in GUID as well).

Then, execute the revised procedure with different inputs:

EXEC dbo.usp_getorders N'10248,10250,10249';
EXEC dbo.usp_getorders N'10260,10270,10265,10290';

Query the cached plans:

SELECT objtype, usecounts, sql
FROM sys.syscacheobjects -- in 2000 query master
  .dbo.syscacheobjects
WHERE SQL LIKE '%995F4DF8-6518-47EE-AB9E-
  2E2876DA1700%'
 AND sql NOT LIKE '%sys%';

You get the output in Table 3, indicating that although the procedure’s plan was reused (the shell for the call to the dynamic batch), the plans for the prepared statements weren’t cached.

Control. To work around the problem of limited control over the order of the rows in the output, you can add an ORDER BY clause to the query to sort the output by existing attributes from the Orders table. However, there’s no straightforward way for you to ask to sort the output by the order of appearance of the order ID in the input array. For example, given the input array 10248,10250,10249, you want the output to be sorted with 10248 first, 10250 second, and 10249 third. Although this task isn’t impossible, solutions based on the existing implementation of the procedure are tricky and inefficient.

Coming Up
Although using dynamic SQL to handle an unknown number of input elements in a stored procedure is a common solution, I don’t recommend it. You should consider a less problematic solution, such as using a T-SQL split function, a CLR split function, or SQL Server 2008’s table valued parameters. I’ll discuss these alternatives in subsequent articles.



ARTICLE TOOLS

Comments
  • ELISABETH
    3 years ago
    Jan 26, 2009

    What about using XML? It's a very clean way of handling array-type data in SQL Server. I've posted a blog article about using XML for this purpose on my company web site at http://techknowsolve.com/blog/?p=13.

    Mary Delamater
    TechKnowSolve
    Portland, Oregon
    http://www.techknowsolve.com

You must log on before posting a comment.

Are you a new visitor? Register Here