• subscribe
June 11, 2009 12:00 AM

CLR-Based Solution for Running Aggregates

Good performance for large partitions
SQL Server Pro
InstantDoc ID #102097

Executive Summary:

If performance is your main goal in running aggregates, you should use a set-based solution with very small partition sizes; otherwise, you should use a Common Language Runtime (CLR)-based solution.


Author’s Note:
I’d like to thank Adam Machanic, Dejan Sarka, and Eladio Rincón for providing preliminary code samples and advice regarding the CLR-based solutions.

In the past couple of months I started a series of articles covering solutions to running aggregates. This series focuses on analyzing performance aspects of the various solutions—in particular their algorithmic complexity. In other words, this series explores the way the solutions scale when different variables such as the number of aggregates, partitions, or rows per partition change. In "Subqueries and Joins for Running Aggregates" I presented the problem and discussed set-based solutions using subqueries and joins, and in "Set-Based vs. Cursor-Based Solutions for Running Aggregates" I covered a solution based on T-SQL cursors. This month I present a Common Language Runtime (CLR)-based solution and explain the circumstances in which it would perform better than the other solutions.

Getting Started
For your convenience, Web Listing 1, Web Listing 2, and Web Listing 3 contain the code necessary to create and populate the Sales table that is used in the article’s examples. Run the code in Web Listing 1 to create the table. Run the code in Web Listing 2 to create the GetNums function that is used to populate the Sales table with sample data. Use the code in Web Listing 3 to populate the Sales table with sample data, adjusting the number of partitions (employees) and number of rows per partition (days per employee) based on your needs.

As a quick reminder, the Sales table contains a row for each employee and date, with the employee ID, sales date, quantity, and value. The running aggregate that is used as the basis for the performance analysis is a running sum of quantity (or value) per employee and date; that is, for each employee and date, calculate the total quantity from the beginning of the employee’s activity until the current date.

CLR-Based Solution
SQL Server 2005 introduced CLR integration, which lets you develop routines such as functions and procedures using .NET code. CLR integration complements T-SQL in areas where it is weak, such as string manipulation, iterative logic, and so on. For the most part, T-SQL is the preferable option when the task at hand involves data manipulation, both in terms of performance and in terms of code complexity. However, there are a few uncommon cases that involve data manipulation tasks in which CLR-based solutions sometimes outperform T-SQL solutions. Running aggregates is such an example. The reasons that in certain cases CLR-based solutions can outperform T-SQL solutions have to do with the way the optimizer currently treats set-based solutions, and because of lack of support for ANSI SQL set-based language elements that lend themselves to better optimization. I’ll present the CLR-based solution and compare its performance to the solutions that I provided in previous columns. In a future article I’ll describe the missing language elements and explain why they have great potential to perform better than any other solution.

Listing 1 shows a CLR-based solution for our problem using C# code. The solution defines a stored procedure called SalesRunningSum that calculates a running sum of quantity for each employee and date. Listing 2 has the Visual Basic version of the solution in case that's your language of preference.

The procedure’s code defines a SqlConnection object called conn that uses the current user’s connection to SQL Server by specifying the option context connection=true. The code then defines a SqlCommand object called comm that uses conn as its connection. The code sets the CommandText property of comm to a query against the Sales table that retrieves the sales rows (employee ID, date, and quantity) ordered by employee ID and date. The code then defines an array called columns containing four SqlMetaData objects with the definitions of the four result columns that the procedure will return in its result set: empid, dt, qty, and sumqty. The code continues by defining a SqlDataRecord object called record based on columns. The code then marks the beginning of the result set that will be streamed to the client by invoking the SqlContext.Pipe.SendResultsStart method based on the record parameter, and opens the connection.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here