Queries and Bottlenecks
Analysis Services responds to queries with varying performance, depending on the complexity of the query. For example, a MOLAP cube that you create by accepting the default partition in Analysis Manager would respond to a simple query like the one that Listing 1 shows by returning roughly 2000 records in only 5 seconds.
If your queries basically ask only for pre-aggregates in a few records or columns, any MOLAP cube with any percentage of aggregationeven as little as 5 percentwill perform well. However, for a query like the one that Listing 2 shows, which involves six calculated members, a 30 percentaggregated, single-partition MOLAP cube would take 52 seconds to return just 331 street intersections. These disparate results suggest that performance bottlenecks don't depend on the size of the result set or on the percentage of aggregation in the cube. In fact, in my experience, any aggregations beyond 30 percent are a wasteyou get no better performance for your effort. For simple queries, you don't need high aggregation. For complex queries, high aggregation won't help. Performance bottlenecks in Analysis Services typically come from calculated members that scan for multiple tuples and aggregate them on the fly.
The City of Portland traffic engineers I work with typically ask ad hoc questions that are nonhierarchical along the Time dimension. For example, an engineer might ask me to calculate the total number of accidents during the past 3 years, the past 5 years, or any combination of years between 1985 and 2001. I can't simply aggregate years by creating a new level above the Year level in the Time dimension; the new level would satisfy only one combination of years. This limitation means all queries that involve a combination of years have to use calculated members to perform aggregations for the specified years.
Listing 2's query returns accident counts along the Time, Occupant_Severity, and Streets dimension members. Figure 2 shows the members of the Time and Occupant_Severity dimensions. Listing 2's query uses six calculated membersAccident_Count, Fatal, Injury_A, Injury_B, Injury_C, and PDO (Property Damage Only)to sum the accidents in the years 1998, 1999, and 2000 for each of the five members of the Occupant_ Severity dimension. The query asks for a sorted and filtered result set of accident counts for each street intersection ([Street].[Street_List]) in each of these six calculated members. To contrast with the performance of such on-the-fly aggregation, I've included Listing 3, page 30, which accesses only pre-aggregations and doesn't include calculated members. I used Listing 2 and Listing 3 as the benchmarks for my cube partitioning tests, which I discuss in a moment.
When you need to improve the performance of queries that involve calculated members, cube design is important. In my experience, the most important aspect of cube design isn't how much memory you have, the number of disks or CPU threads you have, whether you use unique integers for member keys, or even whether you use the Usage-Based Optimization Wizard, but how you partition the cube.
Partitioning is slicing a cube along a tuple such as ([Occupant_Severity].[Fatal], [Time].[2000]), which specifies a member from each dimension. For any dimension that you don't specify in this tuple, the partition includes the entire dimension. Analysis Services keeps in the cube structure a direct pointer or index to the partition for that tuple. Whenever a query references that tuple or a subset of it, Analysis Services can get to the corresponding partition without scanning the entire cube. You can partition a cube in a nearly infinite number of ways, and Analysis Services supports as many partitions as you practically need for a cube. But without a clear rule for creating partitions, you could create too many cuts or wrong cuts on a cube and end up with worse performance than you'd get with one default partition.
Prev. page
1
[2]
3
next page