DOWNLOAD THE CODE:
Download the Code 39394.zip

To analyze business data, you often need to get aggregate values from database tables. Typically, you can use a GROUP BY clause and an aggregation function such as COUNT() or SUM() to return the information. But what if you need multiple aggregate values for presentation or comparison in one report? For example, your boss might want to track and compare the sales of a product over time, so she requests the total number of orders for today, this month, this year, and last year.

The GROUP BY clause works when you can specify how the values should be grouped (e.g., by month, by day, by total number of orders). But specifying the basis for grouping values is possible only if you can obtain the group values directly from the column data or by applying a standard function or computation to one or more columns in the table. For example, you can apply the YEAR() function to a date column and get values grouped by year, or you can use the LAST_NAME column in the GROUP BY clause to get a grouping of names on the basis of last name.

But in many situations, GROUP BY provides no way to specify the basis for grouping and aggregating values. Say you want to perform quantity aggregation for time intervals such as today and this week. You can't use the GROUP BY clause to define a day and a week in one expression because you can't obtain today and this week by applying any one function or expression to any column data. Alternatively, you might try using a WHERE clause, which would be a slightly better choice because you could use two queries to filter the rowset for the two periods and get the desired aggregate values. However, many reports require tens or even hundreds of aggregates. When you use the WHERE clause for such reports, you have to write many queries and performance decreases. In addition, the process of extracting data from query functions and presenting the data becomes tedious as n queries return n result sets, which you then must typically convert for presentation in one comparison report.

Ideally, you need to be able to calculate multiple aggregations in one query. The more aggregations you can merge into one query, the more speed you gain because you need only one connection and because rowset iteration (reading, processing, or parsing of the table data) happens only once—regardless of the overhead you incur by creating a more complex query, as I'll explain in a moment. I call the technique I use for this kind of calculation inline conditional aggregation. This technique takes extra time for computing conditional expressions during aggregation, but the overall performance gains more than compensate for this added overhead.

The technique uses a conditional expression that you define in a CASE...END expression as the input for an aggregate function such as COUNT() or SUM() in the SELECT statement. The CASE expression lets you specify almost any valid T-SQL expression as a condition. CASE...END in T-SQL is similar to the If statement in general programming languages and to the switch...case statement in C because you use CASE...END for branching into a possible result based on evaluation of Boolean expressions. (For more information about CASE, see the "Transact-SQL" section of SQL Server Books Online—BOL). And by using inline conditional aggregation in one query, you can obtain a horizontal-row format without having to convert all your results from vertical format.

How It Works
Let's look at a simple example of how inline conditional aggregation works. For each row in a database table that a given query accesses, the code passes to the aggregate function the result of the CASE...END expression. For example, the code in Listing 1 shows a CASE...END expression that tells the query to use the value in the quantity column for aggregation only if the order date is August 10, 2003. Otherwise, the expression passes to the aggregate function a value of 0 for that row, as long as the table contains a date column (FLD_ORDER_DATE) and a quantity column (FLD_QTY).

The code in Listing 1 returns the total order quantity for the specified date without using GROUP BY to assist the aggregate function and without using a WHERE clause to filter the rowset from the table. For every row that the query processor iterates through and that reaches the SELECT clause (after being filtered through any WHERE clauses), the code evaluates the CASE...END expression. The SUM() function receives the resulting quantity value in the column only if the row data satisfies the condition. I'll walk through a more complex example in a moment. But first, let's look at some important points to remember when you're using this technique.

Inline conditional aggregation is extensible to multiple aggregates. Because the aggregation, filtering, and grouping happens within the scope of one SELECT statement, you can easily extend a SELECT statement to include more than one inline conditional aggregation column. For example, you can use one query containing multiple conditional aggregate columns to return the quantities for a week, a year, and any other periods you want to see—all at the same time and in separate columns. And you can use any aggregation function or condition as part of the SELECT statement's aggregation column. The conditions go in the CASE...END expression, and the aggregation function takes the entire CASE...END expression as a parameter. I used the SUM() function for the examples in this article. In addition, you can compute and filter within the CASE expression based on any data in the row you're evaluating.

You can combine inline conditional aggregation with the WHERE clause. You use WHERE to filter out rows that aren't relevant in any of the inline aggregations. This filtering makes your solution faster if you're computing multiple aggregates because multiple CASE...END expressions don't have to evaluate irrelevant rows. For example, if you're calculating aggregates for different periods in the current year, you could restrict the rowset to the current-year data by using the WHERE clause to eliminate data for previous years. You could instead use the CASE...END expressions to filter out data you don't need, but that approach is wasteful and slow because it ends up eliminating the rows for each CASE...END expression instead of eliminating them just once through the WHERE clause.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

How do I add functionality to automatically compute the date range i.e.

SUM (CASE WHEN trans_date BETWEEN GETDATE() and GETDATE()-365 THEN vid.QTY ELSE 0 END)/12

jwfrie2

Article Rating 5 out of 5