Executive Summary:
Multidimensional Expression (MDX) is a query language for OLAP databases. MDX programmers strive to create calculated members that are simple yet powerful and easy to maintain, but sometimes those members have poor query performance. In this article, Paul Goldy describes why performance problems can occur and how to work around them by walking you through an example involving a consulting company that uses headcount as its metric. |
MDX programmers like calculations that are
elegant (i.e., a solution that is simple
yet powerful and easy to maintain) and have good query
performance. However, it’s not uncommon for MDX programmers
to create a calculated member whose business
definition and initial implementation are elegant, only to
find out later that the member is so slow that it’s unusable
in queries against a full data set.
To see why performance problems can occur and ways
to work around those problems, let’s look at an example
involving a consulting company that uses headcount as
its metric because it has no other capital assets to track.
This consulting company tracks its consultant headcount
by tracking the number of hires, terminations, and
transfers. The consultant “inventory” is derived into
[Begin Count] and [End Count]. Nearly all other
metrics, such as full-time equivalent (FTE) utilization
and average cost per FTE, rely on the beginning
and ending headcounts. After providing the business
definition, I’ll show you one solution that is elegant but has
poor performance and another solution that isn’t as elegant
but has acceptable performance.
The business definition is
[End Count] = [Begin Count] +
[Termination Count] +
[Hire Count] + [Transfer Count]
[Begin Count] = [End Count]
of the prior period
In this definition, the [End Count] value of one period
becomes the [Begin Count] value for the next period. For
example, look at the sample data in Figure 1. Week 22’s
[End Count] value is 120, which becomes Week 23’s [Begin
Count] value. The [Termination Count], [Hire Count], and
[Transfer Count] values are loaded measures.
One elegant solution is to take advantage of the Prev-
Member function in MDX. Listing 1 shows the [Begin
Count] definition for this solution. If you use this definition
with the data in Figure 1, Week 24’s [Begin Count]
value of 130 is Week 23’s [End Count] value. Similarly,
Week 23’s [Begin Count] value of 120 is Week 22’s [End
Count] value. The same pattern would be followed if there
were more data.
Callout A in Listing 1 highlights the [End Count]
definition for solution 1. Referencing [Begin Count] within
the definition of [End Count] introduces a recursive definition
in which every [End Count] calculation uses [Begin
Count], which in turn references [End Count].
If you’re familiar with recursive solutions (e.g., Tower of
Hanoi), you likely noticed that the [Begin Count] definition
doesn’t include an endpoint to end the recursion. Recursive
solutions typically have a check that tells the recursion to
quit and return. Instead of an explicit exit criterion, the
[Begin Count] and [End Count] definitions in Listing 1
rely on an implied recursion endpoint from Microsoft
SQL Server 2005 Analysis Services. The implied recursion
endpoint comes from the code
[Calendars].[Calendars].
CurrentMember.PrevMember
The PrevMember function eventually references a date
outside the cube space. When the tuple
([Calendars].[Calendars].
CurrentMember.PrevMember,
Measures.[End Count])
references a [Calendars].[Calendars] hierarchy member
outside the cube space, the [Calendars].[Calendars].CurrentMember.
PrevMember returns NULL, which ends
the recursion.
For example, suppose we have only five consecutive
weeks of data, starting with Week 20, for a division in
the consulting company. In Week 20, 100 employees were
transferred into the division and 5 new employees were
hired. Week 20’s [Begin Count] value is NULL because it
references PrevMember, which doesn’t exist. If we could
view the [Begin Count] definition while Week 20 is the
CurrentMember, the tuple would look like
([Calendars].[Calendars].
[Week 20].PrevMember,
Measures.[End Count])
There is no previous member for Week 20, which results in
a NULL value for the tuple from Analysis Services.
The performance of [Begin Count] can get very slow,
even when there’s only a small
number of members to iterate
through. For instance, suppose
we have 12 years of data
and a calendar hierarchy of
year-quarter-month-day, which
means we have 4,380 (12 × 365)
day-level members. On a two-processor
2GB development
server, solution 1 can take up
to 30 minutes to return [Begin
Count], as defined in Listing
1, when Calendar.Current-
Member is a day-level member.
A modified version of solution
1, solution 2 takes advantage
of the non-additive nature
of [Begin Count]. In solution 2,
[Begin Count] is the same value
for the first member in a period,
regardless of which level the
member belongs to in the calendar
hierarchy. For example,
as Figure 2 shows, if [Begin
Count] is 150 at the year level
(2006), [Begin Count] is the
same value at the quarter level
(Q1-2006), month
level (Jan-06), and
day level (1-Jan-06).
In Solution 2,
a member's parent
has the same [Begin
Count] value as the member when the member is the
first sibling of the parent. The [Begin Count] definition
is shown in callout A in Listing 2. This definition
includes an IIF statement, which works as follows:
If Calendar.CurrentMember is the first sibling, then
we use [Begin Count] with Calendar.CurrentMember
.Parent. Otherwise, we use [End Count] with Calendar
.CurrentMember.PrevMember.
Solution 2's [End Count] is also defined with an IIF
function, as Listing 2 shows. Here's how that IIF function
works: If Calendar.CurrentMember.Parent is the highest
valid point in the calendar hierarchy, then we add up
the hire, termination, and transfer counts related to the
[Fiscal] member, and that result becomes the [End Count]
value. This is the end point of the recursion. Otherwise, we
add up the begin, termination, hire, and transfer counts,
and that result becomes the [End Count] value. The reference
to [Begin Count] forces us to iterate (via recursion)
back to the [Begin Count] definition.
Using the technique of looking for the parents of
first siblings lets us evaluate [Begin Count] with far fewer
recursive calls than if we use the recursive technique in
solution 1. The reduction in the number of recursive calls
is what makes solution 2 perform significantly better than
solution 1.
—Paul Goldy, Technical Analyst, Wirestone