You can combine inline conditional aggregation with the GROUP BY clause. Although Listing 1's code works for multiple aggregates without using the GROUP BY clause, the code is also compatible with GROUP BY. Using GROUP BY to further segregate your inline aggregation, as I'll show in a moment, lets you aggregate in two dimensions or aspects at the same time. For example, say you want the total number of orders for today and this week for each product type. You can define the aggregation columns for today and this week by using inline aggregation, then use GROUP BY to further segregate the aggregates according to product typeas long as the date, quantity, and product-type columns are in the table you're working with.
As I mentioned before, inline conditional aggregation produces a table in which aggregation happens in columns such as total order quantity for today or total order quantity this week and expands horizontally as you add more aggregates. However, the GROUP BY clause produces aggregations in rows such as total order quantity for product type C1 and expands vertically. So combining the two techniques can also let you create complete summary tables from one query faster than using multiple queries for the same purpose. When I used this approach to create a production-summary report that showed total production for different product types across multiple time intervals, the query-processing time, including data collection and overall report generation, dropped from 30 seconds to 3 seconds.
A Real-World Example
Now that you understand the basics of inline conditional aggregation, let's look at a real-life example of how to use this technique to create a sales summary report for comparing product sales across time intervals. This example report includes grouping by aspects such as zone and product category. Consider the typical Order table that Figure 1 shows, which has columns for order date, product type, product ID, quantity, and other values. (You can download the script to create and populate Figure 1's table by entering InstantDoc ID 39394 at http://www.sqlmag.com.)
Say you need to produce a daily report that shows in table format the total number of orders your company has received during the following time intervals:
- This day (F_DAY)
- Up to this month in the year (F_UPTO_MONTH)
- This month (F_MONTH)
- Total this year (F_THIS_YR)
- Same period previous year (F_PREV_YR_SAME_PERIOD)
- Total in the previous year (F_PREV_YR_FULL)
The names in parentheses are the column names I used in the downloadable table-creation code. Now, assume that on August 10, 2003, you want to generate a report in the format that Table 1 shows. The table displays the format for a report that will contain data for four product categories: C1, C2, C3, and C4.
To produce the report that Table 1 shows, you need to combine inline conditional aggregation with the GROUP BY clause, as the query in Listing 2 shows. This query returns all the aggregates in a result set in Query Analyzer, which Figure 2, page 36, shows. Listing 2's code doesn't have a WHERE clause, but as I mentioned earlier, you could use a WHERE clause to filter irrelevant rows. Listing 2's query has an aggregate column for each time interval that Table 1 shows. The CASE...END expression evaluates each row of data for each column. The code adds the order quantity to each column's aggregation only if the condition in the CASE...END expression is satisfied. For example, if an order's date appears as August 10, 2003, the first aggregation column expression (F_DAY) is satisfied, the code passes the order quantity to the SUM() function, and that quantity becomes part of the aggregation for F_DAY. Similarly, the code sums the rows that contain orders for August 1 through August 10 as F_MONTH because those values satisfy the date comparison in the CASE...END expression for the F_MONTH column.
Listing 2's query also uses the GROUP BY clause to segregate product type. Understanding the GROUP BY clause's role might be easier if you knew what would happen if the clause wasn't there: The query would return only one row of data, which would include columns that have total order quantities for all the time intervals regardless of product type. In other words, without GROUP BY, you'd have aggregations for one aspecttime intervalsand you'd get one total-order-quantity value per time interval. But when you add the GROUP BY clause, the code further separates the aggregates according to product type, which means you get a total order quantity for product-type C1 for today, total order quantity for product-type C2 for today, and so on. Thus, you can aggregate for time interval and product type at the same time.
Notice that the result set in Figure 2 is laid out exactly like Table 1. I mention this because database developers often display aggregates in rows when they need the values displayed for comparison in column format. To produce the column format, I used an inline aggregation column for each time interval required (e.g., F_DAY, F_UPTO_MONTH, F_MONTH). I defined each interval by using explicit dates in the CASE expression. The GROUP BY clause works as it would for any other aggregation, automatically segregating the aggregation further for each product category.
For this example, I used dates that reflect a financial year beginning April 1 and ending March 31 of the next year. Table 1 shows the exact dates for each interval. For brevity, I won't show how the code calculates these dates, but you can use T-SQL or any other language to calculate the dates. If you pass the date that denotes today to a T-SQL stored procedure, for example, the stored procedure can then compute the remaining required dates. Given one date as a parameter, you can combine the date computation with the inline aggregation query in one stored procedure to generate all the aggregates for the report. Active Server Pages (ASP) developers, for example, can prepare the dates and the entire query by using a scripting language such as VBScript, then execute the query by using ADO in an ASP page. How you calculate the dates is up to you.
Figure 3 shows Listing 2's query plan in Query Analyzer. Query Analyzer lets you see how SQL Server will execute the operations in a query and how much time each operation will take. The query plan for Listing 2's query comprises four major operations. As you can see, most of the query's processing time (76 percent) is spent on the Clustered Index Scan operation, circled in black in Figure 3. This operation takes the most time because it iterates through the entire clustered index and retrieves the relevant rowset in which the remaining operations take place. In contrast, the Sort operation takes 24 percent of the total processing time, and grouping and aggregations take only about 1 percent of the total processing time. When you add inline aggregation to a query, the additional processing falls into this 1 percent time block.
If you compare one query that uses conditional inline aggregation with multiple queries that use GROUP BY and WHERE clauses, the queries' execution plans would be similar; each query would have Clustered Index Scan and Sort operations. But because conditional inline aggregation uses only one query, you reduce the overall processing time dramatically. For example, say one query producing one aggregate takes 3 seconds. A report that needs 10 aggregates would typically require 10 queries, so you'd need 30 seconds for the solution (10 * 3 = 30). But when you use inline aggregation, you can obtain all 10 (or more) aggregates in one query that requires roughly 3 seconds. So the solution now takes one-tenth of the time that the multiquery solution requires. This approximation might vary slightly because of factors such as caching. But in my experience, even when a data table has tens of thousands of rows, using inline conditional aggregation can reduce query time by 80 percent or more.
Inline conditional aggregation has the rare dual advantages of providing tangible gains in performance and reducing code size. This technique also demonstrates that T-SQL provides many ways to solve problems and proves that writing optimized queries isn't just a matter of knowing the SELECT statement's syntax; you also need practice and experience. When you try different methods, you can find the best technique for each situation you face.
End of Article
Prev. page
1
[2]
next page -->