DOWNLOAD THE CODE:
Download the Code 41230.zip

Analysis Services is a high-performance, multidimensional query engine for processing analytical and statistical queries, which a relational SQL engine doesn't handle well. When such queries are simple or have pre-aggregations, Analysis Services can make your job easier. But when queries become complex, Analysis Services can bog down. For example, an SQL SELECT statement that includes a GROUP BY clause and aggregate functions can take as long as a few minutes—or more. You can retrieve the same result set in just a few seconds if you execute an MDX statement against an Analysis Services Multidimensional OLAP (MOLAP) cube. You perform this workaround by passing an MDX query from SQL Server to a linked Analysis Server by using the OPENQUERY function in an SQL SELECT statement, as SQL Server Books Online (BOL) describes. Analysis Services then precalculates the necessary aggregations during the processing and creation of the MOLAP cube so that the results are completely or partially available before a user asks for them.

However, precalculating every imaginable aggregation is impossible; even a completely processed MOLAP cube can't precalculate aggregations such as those in calculated cells, calculated members, custom rollup formulas, custom member formulas, FILTER statements, and ORDER statements. If you're used to the performance you get when you retrieve only precalculated aggregations, the performance you get from an MDX query that involves these kinds of runtime calculations might seem unbearably slow. The problem might occur not because Analysis Services can't handle runtime calculations efficiently but because your MOLAP cube's design isn't optimal.

In my work building and maintaining a data warehouse for the city of Portland, Oregon, I optimize Analysis Services so that traffic engineers can quickly access a variety of statistics about traffic accidents in the city. Through many experiments, I've discovered that an important key to MOLAP optimization is cube partitioning. In this article, I explore and compare various MOLAP cube-partitioning strategies and their effects on query performance. Then, I suggest some simple guidelines for partition design.

Traffic-Accident Data Warehouse
My study of query performance is based on my work with a real dimensional data warehouse that maintains traffic-accident history. When I conducted this study, the traffic-accident data warehouse contained 17 years of data (1985 through 2001) and documented about 250,000 unique incidents. The complex part of this data warehouse is not its relatively small fact table but its many dimensions, which the snowflake schema in Figure 1 shows. Portland's traffic engineers look for the street intersections that have the highest number of incidents. Then, they search for clues about which factors might cause a high number of crashes and what makes some accidents more severe than others. They look at a total of 14 factors (which are the data warehouse's dimensions) including time, light, weather, traffic control, vehicle, and severity of occupant injuries. Among the dimensions, the Streets dimension (STREET_DIM) is the largest; it records roughly 30,000 street intersections in the Portland area. The total number of source records to build aggregations on is the result of a multi-way join of 14 one-to-many (1:M) or many-to-many (M:N) relationships from the fact table to the dimension tables. The Accident data warehouse contains only one measure: the distinct accident count (Incident_Count). A distinct count prevents the possibility of counting the same accident multiple times in a M:N relationship.

Fortunately, the Streets dimension isn't too large to use MOLAP cube storage, which provides the best query performance. Analysis Services defines a huge dimension as one that contains more than approximately 10 million members. Analysis Services supports huge dimensions only with Hybrid OLAP (HOLAP) or Relational OLAP (ROLAP) cubes.

   Prev. page   [1] 2 3     next page
 
 

ADS BY GOOGLE