DOWNLOAD THE CODE:
Download the Code 43981.zip

4. Cumulative totals for each employee before cumulative total reaches 500. The fourth problem is a slight variation of the second problem. This time, you want to see each employee's progress up to, but not including, the 500 quantity barrier. You don't want to see any rows in which an employee reached or surpassed 500. Figure 4 shows the desired result. This problem is simple to solve, as the query in Listing 5 shows, because the HAVING filter is so intuitive. The filter HAVING SUM(O2.qty) < 500 returns all rows where the cumulative total is less than 500.

5. Point before cumulative total reaches 500 for each employee. The fifth problem is probably the most complex: From the results of the fourth problem, you want to return only the last row for each employee. In other words, you're after only the cumulative aggregate at the latest point before the employee's total reached 500. Figure 5 shows the desired result. You need to figure out when each employee reached or passed the 500 barrier, then calculate the aggregates up to but not including that month. To achieve this, you can change the JOIN condition slightly and match the rows from O1 and O2 based on a less-than operator (<), leaving off the equal-to operator (=):

ON O2.ordermonth < O1.ordermonth

This filter would match each order month from O1 with all earlier order months from O2. In this way, you can calculate aggregations up to but not including O1's order month and figure out when the employee reached or passed the 500 limit by adding O1's quantity to the total quantity. Take employee 1's data as an example. The filter would match the row for September 1996 in O1 with all earlier months in O2 (July 1996 and August 1996). The total quantity from O2 (121+247=368) is smaller than 500. The total quantity from O2 (121+247=368) plus the current month's quantity from O1 (368+255=623) is greater than or equal to 500. So, the month you're after is the latest month in O2, namely August 1996. Remember that the request was for the month just before the employee reached the 500 barrier.

But the revised JOIN condition introduces a problem. Because you're matching the current month from O1 to earlier months from O2, the first (earliest) row for each employee won't have a match. If an employee's earliest row happens to be the row you're after, you won't get it back in the result. To solve this problem, change the join type to outer join. An outer join will return the earliest row for each employee, with NULLs in the attributes from O2, because it won't find a match for these rows. Use the ISNULL() function to substitute a 0 for the aggregation of an employee's earliest row.

Now that you've determined the join type and JOIN condition, you can work on the HAVING filter. You can use a similar approach to the one you used in the third problem. For that problem, you used the filter

SUM(O2.qty) - O1.qty < 500 
AND SUM(O2.qty) >= 500

However, because you used a "<=" operator in the HAVING clause in that case, the total quantity included the current month, and in this case, it doesn't. Also, you want to substitute a 0 for a NULL result, so you end up with the following filter:

ISNULL(SUM(O2.qty), 0) < 500 AND ISNULL(SUM(O2.qty), 0) + O1.qty >= 500

The total quantity before the current point in time must be smaller than 500, and the total quantity including the current point in time must be greater than or equal to 500. For each employee, you isolate the month in O1 where the employee's order quantity reaches or passes 500—but that isn't what you need. Instead of returning the current month from O1, you can return the latest month from O2, which represents the point just before the order total reached 500. The query in Listing 6 shows the complete solution to problem 5.

Tricky Filters
Some variations of cumulative aggregate problems are fairly simple to solve, but others can be quite tricky. All the solutions rely on accessing two instances of the same table: one "fixed" instance that duplicates the rows for each order month match, and another "running" instance where the rows are aggregated. You aggregate the data by attributes from the fixed side of the join and accumulate attributes from the running side. The toughest part of each problem is figuring out the HAVING filter, which tells SQL Server which groups to return. This article's examples demonstrate the important variations that require tricky filters, which you can use when facing similar problems in your data warehouse.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

In my understanding the primary index includes an unnecessary field - qty. I can not see the reason and also it makes the index wider...

pkpetroff

Article Rating 4 out of 5

Excellent article

Valdemar from brazil

Anonymous User

Very Intresting article!!!

BTW (for pkpetroff ;-). He have added the "unnecessary" field so he will have "covering Index" with this field...

Karmona,moti

Article Rating 5 out of 5

 
 

ADS BY GOOGLE