The GROUPING_ID Function
As I mentioned earlier, the advantage of the new SQL
Server 2008 features related to grouping sets—compared
with unifying result sets of multiple GROUP BY queries—
is the brevity of code and the efficiency of calculating
the aggregates. Because a single query can now
handle multiple grouping sets, you might want to easily
and efficiently distinguish one grouping set from another
in the result set. You might also want to materialize the
result set of such a query in a table, then repeatedly query
it for a different grouping set in each request. You need
an identifier (for each grouping set) that you can use to
isolate a grouping set, and that you can also index in the
materialized table so that a request for each grouping set
will be satisfied efficiently.
The answer lies in the new GROUPING_ID function.
The function accepts a set of attributes as input—typically
all attributes that participate in any of the grouping
sets—and returns an integer bitmap in which each bit
represents a different attribute. A bit is turned off if the
corresponding attribute is part of the current grouping
set and turned on if it isn’t. For example, consider the
function GROUPING_ID(e, d, c, b, a). The attribute a is
represented by the first bit (1), b by the second bit (2), c by
the third bit (4), d by the fourth bit (8), and e by the fifth bit (16). All rows in the result set produced for the grouping
set (a, c, e) will have the integer 10 returned from the
function. Remember that a bit representing an attribute
is turned off when the corresponding attribute is part of
the grouping set; hence, the bits representing the attributes
a, c, and e are turned off, and the bits representing the
missing attributes b (2) and d (8) are turned on. You get
2 + 8 = 10. The following query should help clarify the
output of the GROUPING_ID function:
SELECT
GROUPING_ID(e, d, c, b, a) as n,
COALESCE(e, 1) as [16],
COALESCE(d, 1) as [8],
COALESCE(c, 1) as [4],
COALESCE(b, 1) as [2],
COALESCE(a, 1) as [1]
FROM (VALUES(0, 0, 0, 0, 0)) AS D(a, b, c, d, e)
GROUP BY CUBE (a, b, c, d, e)
ORDER BY n;
Web Table 3 shows the output of this query.
The query produces all possible permutations of
grouping sets out of five attributes: a, b, c, d, and e. With
five attributes, you get 25 = 32 possible grouping sets.
Because the source table is a virtual table with only one
row (with zeros in all attributes), you get only one row in
the result set for each grouping set. I used the COALESCE
function to substitute a NULL representing the fact that
the attribute isn’t part of the current grouping set with 1.
You can see that the GROUPING_ID function returns
an integer that is equal to the sum of the values of the
bits representing the missing attributes in the current
grouping set.
For a more tangible example, run the code in Web
Listing 2 to materialize the result set of the query I
showed in the previous section, along with the result of the
GROUPING_ID function (call it grp_id), and cluster the
table by the grp_id value and the attributes. Now, whenever
you need to pull all rows produced for a specific grouping
set, you simply need to query the table and filter the relevant
grp_id value. The table was clustered first by the grp_id
value, so the execution plan will be as efficient as it can
get (clustered index seek plus partial ordered scan). For
example, to pull the grouping set (custid, orderyear, ordermonth),
you need to filter grp_id = 9 (non-participating
attributes empid – bit 8 plus orderday – 1 = 9):
SELECT *
FROM dbo.MyGroupingSets
WHERE grp_id = 9;
As I mentioned earlier, the query is very efficient, and you
get the expected result set, as Web Table 4 shows.
Incremental Updates
You’re probably wondering whether SQL Server 2008 also
provides the means to handle incremental updates of the
materialized aggregates stored in the MyGroupingSets
table—that is, add the delta of changes since the last processing
of aggregates to MyGroupingSets without reprocessing
the whole
thing. SQL Server
2008 does offer that
means, as long as
the aggregates are
additive (e.g., SUM,
COUNT, and AVG,
which can be calculated
based on SUM
and COUNT). You
can achieve incremental
updates by using another new T-SQL feature in
SQL Server 2008—the MERGE statement—which I’ll
cover in a future article. But how about just a glimpse?
Suppose MyGroupingSets currently holds aggregates
for all activity up to and including April 18, 2008. Run
the code in Web Listing 3 to simulate new order activity
added in April 19, 2008. Then, run the code in Web
Listing 4 to apply the incremental load of April 19, 2008,
to the MyGrouping-
Sets table using the
new MERGE statement.
Don’t worry
about the syntax;
remember, this is
only a glimpse. The
basic concept of the
code in Web Listing
4 is to add rows
whose keys don’t
exist in the target
table and update
rows whose keys do.
Less Code,
Better
Performance
SQL Server 2008 introduces several important features
related to grouping sets: the GROUPING_SETS, CUBE
and ROLLUP subclauses and the GROUPING_ID function.
It also introduces an unrelated feature—the MERGE
statement—that can help in handling incremental updates
of additive aggregates, among other things. With these
new features, you can support applications that need to
provide analytical capabilities of aggregates with less code
and better performance. These new features also let you
provide a relational alternative to basic cubes in Analysis
Services for customers who prefer to stick to the relational
data warehouse/data mart alone. For information about
that alternative, see the Web-exclusive sidebar “Grouping
Sets in the Relational Database as an Alternative to Cubes
in Analysis Services,” InstantDoc ID 97006. (see Associated .Zip File)
End of Article
Prev. page
1
[2]
next page -->