• subscribe
July 20, 2009 12:00 AM

Using Nested Iterations and the OVER Clause for Running Aggregates

Determine the best solution depending on partition size
SQL Server Pro
InstantDoc ID #102336
Executive Summary:
Until Microsoft SQL Server supports the OVER clause, your best solutions for running aggregates are a set-based solution using subqueries or joins, for very small partition sizes, or a CLR data reader–based solution, for large partitions.


A running aggregate is an aggregate that keeps accumulating values over a sequence—typically temporal, possibly within partitions. A running sum of quantity over time (days) for each employee is an example of a running aggregate. For each employee and date, you’re looking for the total quantity from the beginning of the employee’s activity until the current date.

In previous articles I covered three solutions to running aggregates: a set-based solution using subqueries or joins, a T-SQL cursor-based solution, and a CLR-based solution using a data reader. (See the Learning Path for these and other articles related to running aggregates.) I explained how the solutions scale when you change various aspects, such as the number of aggregations, the number of partitions, and the partition size.

In this article I present two new solutions—one based on nested iterations that you can implement in SQL Server 2008 and 2005, and another that SQL Server doesn’t yet support (as of SQL Server 2008), but hopefully will in the future.

Getting Started
For consistency with my previous articles, I’ll use the same problem and sample data. Web Listing 1, Web Listing 2, and Web Listing 3 contain the code to create the Sales table from my previous examples and populate it with sample data. Run the code in Web Listing 1 to create the Sales table. Run the code in Web Listing 2 to create a table function called GetNums that generates a sequence of integers of a requested size, which is later used to populate the Sales table. Run the code in Web Listing 3 to populate the Sales table with sample data. Set the values of the variables @num_partitions and @rows_per_partition to determine the number of partitions (employees) and partition size, respectively, based on your needs.

My sample problem is to calculate the running total quantity for each employee over dates. That is, for each employee and date, calculate the running total quantity from the beginning of the employee activity until the current date.

Solution Based on Nested Iterations
In my previous articles, the solutions I presented were either purely set-based (using subqueries or joins) or purely cursor-based (using a T-SQL cursor or a .NET data reader). The first solution I present in this article combines iterative and set-based logic. The idea is to iterate through the entries within a partition in sequence order, using a set-based query in each iteration to process the nth entry across all partitions. Listing 1 contains an implementation of this approach using a recursive query.

The code in Listing 1 first generates row numbers for the rows from the Sales table, partitioned by empid, ordered by dt, materializes the sales rows along with the row numbers (rownum column) in a temporary table, and clusters the table by rownum and empid. The code then defines a common table expression (CTE) that calculates the running sum. The CTE’s anchor member processes the first entry for all employees (rownum = 1), and the recursive member processes in each iteration the next entry (previous rownum plus 1) for all employees, adding the current entry’s quantity to the running sum of quantity that was accumulated so far.

Listing 2 has an implementation of the nested iterations using loops instead of recursive queries. You might want to use this solution if you’re working with SQL Server versions prior to 2005.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here