DOWNLOAD THE CODE:
Download the Code 97007.zip

The GROUPING_ID Function

As I mentioned earlier, the advantage of the new SQL Server 2008 features related to grouping sets—compared with unifying result sets of multiple GROUP BY queries— is the brevity of code and the efficiency of calculating the aggregates. Because a single query can now handle multiple grouping sets, you might want to easily and efficiently distinguish one grouping set from another in the result set. You might also want to materialize the result set of such a query in a table, then repeatedly query it for a different grouping set in each request. You need an identifier (for each grouping set) that you can use to isolate a grouping set, and that you can also index in the materialized table so that a request for each grouping set will be satisfied efficiently.

The answer lies in the new GROUPING_ID function. The function accepts a set of attributes as input—typically all attributes that participate in any of the grouping sets—and returns an integer bitmap in which each bit represents a different attribute. A bit is turned off if the corresponding attribute is part of the current grouping set and turned on if it isn’t. For example, consider the function GROUPING_ID(e, d, c, b, a). The attribute a is represented by the first bit (1), b by the second bit (2), c by the third bit (4), d by the fourth bit (8), and e by the fifth bit (16). All rows in the result set produced for the grouping set (a, c, e) will have the integer 10 returned from the function. Remember that a bit representing an attribute is turned off when the corresponding attribute is part of the grouping set; hence, the bits representing the attributes a, c, and e are turned off, and the bits representing the missing attributes b (2) and d (8) are turned on. You get 2 + 8 = 10. The following query should help clarify the output of the GROUPING_ID function:

SELECT
GROUPING_ID(e, d, c, b, a) as n,
COALESCE(e, 1) as [16],
COALESCE(d, 1) as [8],
COALESCE(c, 1) as [4],
COALESCE(b, 1) as [2],
COALESCE(a, 1) as [1]
FROM (VALUES(0, 0, 0, 0, 0)) AS D(a, b, c, d, e)
GROUP BY CUBE (a, b, c, d, e)
ORDER BY n;

Web Table 3 shows the output of this query.

The query produces all possible permutations of grouping sets out of five attributes: a, b, c, d, and e. With five attributes, you get 25 = 32 possible grouping sets. Because the source table is a virtual table with only one row (with zeros in all attributes), you get only one row in the result set for each grouping set. I used the COALESCE function to substitute a NULL representing the fact that the attribute isn’t part of the current grouping set with 1. You can see that the GROUPING_ID function returns an integer that is equal to the sum of the values of the bits representing the missing attributes in the current grouping set.

For a more tangible example, run the code in Web Listing 2 to materialize the result set of the query I showed in the previous section, along with the result of the GROUPING_ID function (call it grp_id), and cluster the table by the grp_id value and the attributes. Now, whenever you need to pull all rows produced for a specific grouping set, you simply need to query the table and filter the relevant grp_id value. The table was clustered first by the grp_id value, so the execution plan will be as efficient as it can get (clustered index seek plus partial ordered scan). For example, to pull the grouping set (custid, orderyear, ordermonth), you need to filter grp_id = 9 (non-participating attributes empid – bit 8 plus orderday – 1 = 9):

SELECT *
FROM dbo.MyGroupingSets
WHERE grp_id = 9;

As I mentioned earlier, the query is very efficient, and you get the expected result set, as Web Table 4 shows.

Incremental Updates

You’re probably wondering whether SQL Server 2008 also provides the means to handle incremental updates of the materialized aggregates stored in the MyGroupingSets table—that is, add the delta of changes since the last processing of aggregates to MyGroupingSets without reprocessing the whole thing. SQL Server 2008 does offer that means, as long as the aggregates are additive (e.g., SUM, COUNT, and AVG, which can be calculated based on SUM and COUNT). You can achieve incremental updates by using another new T-SQL feature in SQL Server 2008—the MERGE statement—which I’ll cover in a future article. But how about just a glimpse?

Suppose MyGroupingSets currently holds aggregates for all activity up to and including April 18, 2008. Run the code in Web Listing 3 to simulate new order activity added in April 19, 2008. Then, run the code in Web Listing 4 to apply the incremental load of April 19, 2008, to the MyGrouping- Sets table using the new MERGE statement. Don’t worry about the syntax; remember, this is only a glimpse. The basic concept of the code in Web Listing 4 is to add rows whose keys don’t exist in the target table and update rows whose keys do.

Less Code, Better Performance

SQL Server 2008 introduces several important features related to grouping sets: the GROUPING_SETS, CUBE and ROLLUP subclauses and the GROUPING_ID function. It also introduces an unrelated feature—the MERGE statement—that can help in handling incremental updates of additive aggregates, among other things. With these new features, you can support applications that need to provide analytical capabilities of aggregates with less code and better performance. These new features also let you provide a relational alternative to basic cubes in Analysis Services for customers who prefer to stick to the relational data warehouse/data mart alone. For information about that alternative, see the Web-exclusive sidebar “Grouping Sets in the Relational Database as an Alternative to Cubes in Analysis Services,” InstantDoc ID 97006. (see Associated .Zip File)

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.