To verify the fact that high-end servers can't guarantee better performance for Listing 2 and Listing 3, I reproduced a similar result by using a larger version of the Foodmart 2000 sample database that comes with Analysis Services installation. I named this expanded database Foodmart_85_99. It's an exact mirror of Foodmart 2000, except that I expanded the fact table and Time dimension by replicating their original records 14 times, each time changing the Year member from 1997 (the original value) to 1985 through 1999. I also created a new query, which Listing B shows, which is similar to the structure of the query in Listing 2 but targets the sales cube. I then created a new 30 percent-aggregated MOLAP cube called Sales_25 (in addition to the default cube Sales) that has partitions along data slices that Listing B references-for example, ([1995], [Bachelors Degree]). The results in Figure C again confirm that partitioning along queried data slices (as in the Sales_25 cube) generally improves performance. However, high-end servers still don't perform as well as low-end servers.
These results defy a couple of common beliefs: first, that more CPUs provide better parallelism for cubes that have multiple partitions and second, that more memory buffers more data, producing faster computation. The results of my test appear to show that more memory and more CPUs have no effect on the queries in Listing 2, Listing 3, and Listing B. Instead, one fast CPU outperforms multiple CPUs. Because low-end servers performed better in all of the above tests, I decided to use only the Low3 machine in the rest of my studies.
Table B shows the effect of varying aggregation levels. The response time stays at 52 seconds and 27 seconds for Listing 2 and Listing 3, respectively, except at 100 percent aggregation. This result supports my hypothesis that for complex queries (such as the one in Listing 2) or queries that return many records or columns (as Listing 3's query does), increasing the aggregation level won't improve performance. Since the aggregation level isn't a factor, I decided to fix the aggregation level at a common value-30 percent-for the rest of my tests.
A partition-design question that no one has ever answered is Should you keep an existing partition when you further divide it into new ones, or should you discard it and replace it with new divisions? Keeping the original partition would mean an overlap-duplication of data among the slices. The same problem can happen when you're creating new partitions. For example, when you create new partitions along Time dimension members, you might be tempted to specify these data slices: ([Time].[All Time]), ([Time].[1997]), ([Time].[1998]). You might think that the ([Time].[All Time]) partition includes only the [All Time] member as it appears in Figure A. But specifying the [All] member is exactly the same as leaving that dimension blank-which includes the entire dimension in the slice. You can verify this behavior by looking at the slice column in the Meta Data tab of the partition in Analysis Manager. So in this example, the ([All Time]) partition would overlap with the data in the ([1997]) and ([1998]) partitions. Overlapping partitions not only wastes storage space, it also significantly degrades performance, as Table C shows. The addition of the [All Occupant_Severity] partition overlaps with all of its descendants' partitions-for example, the ([Fatal]) partition-and makes Listing 2's query twice as slow.
I made sure that the six cubes I designed for my tests contained no overlapping partitions. Table A shows the performance of these cubes on server Low3, with the addition of cache performance. If you have enough memory to maintain warm caches for all your queries, partitioning has no effect on performance, regardless of the level of aggregation in your cube. Unfortunately, most of us must work with cold caches, so partitioning is a good choice. For Listing 2, the PartitionYear_Severity7 cube's response time is reduced to 9 seconds because of the partitioning I used and is nearly six times as fast as the default cube (Default1). More important, it's nearly as fast as carrying a warm server cache.
When I applied Analysis Services Usage-Based Optimization to add more aggregations for Listing 2 and Listing 3's queries, the performance of Listing 3 on every cube improved slightly, as Table D shows. However, partitioning appears to still have a negative effect on Listing 3. Interestingly, Usage-Based Optimization has no effect on Listing 2's performance on any cube, possibly because the aggregation of the three year members requires the use of calculated members-you can't preprocess any of the aggregation. Therefore, Usage-Based Optimization couldn't add any more aggregations for Listing 2.
End of Article
Prev. page
1
[2]
next page -->