• subscribe
July 14, 2000 02:48 PM

Optimizing Cube Performance with OLAP Services

SQL Server Pro
InstantDoc ID #9140

Processing times for each storage type. We first determined the differences in cube processing times for the three storage modes. To obtain processing comparisons, we processed identically structured cubes with different data storage methods and aggregation levels. Table 3 presents detailed processing time results. When you use Microsoft OLAP Services to create cubes, your two most important design decisions are the storage mode and the level of aggregation. Table 4 contains descriptions for the storage modes in OLAP Services. In OLAP Services, aggregations are precalculated sums of fact table data from various levels of each dimension. Developers use these aggregations to answer queries and create additional aggregates. When choosing how much aggregation to include in a cube, you must balance the demands of storage space and query response time. At one extreme, precalculating all possible aggregates greatly increases a database's storage requirements. At the other extreme, calculating all aggregates at query time slows query responses.

The tests revealed significant differences in processing times across the three storage types and for various levels of aggregation within each type. ROLAP cubes at 0 percent aggregation had the fastest processing times. ROLAP achieves this speed because OLAP Services doesn't copy the fact and dimension tables for ROLAP cubes, and OLAP Services doesn't calculate aggregations at 0 percent aggregation. However, as the aggregation level increases, processing ROLAP cubes takes longer than processing MOLAP and HOLAP cubes.

We recorded some surprising results at higher aggregation levels. The difference between processing times for MOLAP and HOLAP cubes at 30 and 60 percent aggregation wasn't substantial, but the processing time increased substantially between 60 and 90 percent for MOLAP and HOLAP cubes. For ROLAP cubes, we measured an exponential increase in processing time between 60 and 90 percent aggregation.

Disk space requirements for each storage type. Table 5 presents disk space requirements for the three processing modes. We found that the storage type and aggregation level significantly affect storage requirements.

We determined that MOLAP cubes require more space than HOLAP or ROLAP cubes because OLAP Services copies the original fact and dimension tables into the OLAP database for MOLAP cubes. Although the MOLAP cube takes more space, it no longer requires the star schema when queried. However, HOLAP cubes use the least amount of space because OLAP Services doesn't copy the original fact and dimension tables into the OLAP database, and aggregates are stored in the OLAP database in an efficient multidimensional format.

Disk space usage for all storage types doesn't increase significantly between 0 and 60 percent aggregation but jumps substantially for all types as the aggregation level reaches 90 percent. The increase for ROLAP is especially significant.

Disk space requirements for MOLAP cubes compared with the star schema. Table 6 compares the space that MOLAP cubes require with the size of the original RDBMS's star schema fact table and the dimension tables. Our tests determined that building the aggregations typically required only 20 percent more space.

Table 6 shows that rather than producing a data explosion, OLAP Services actually performs data compression while building OLAP cubes. Surprisingly, even at an aggregation level of 90 percent, the compression level was nearly 79 percent. Your results might differ because the additional space that is required to build MOLAP cubes varies with the number of levels in the dimension, the number of measures, and the type of data.

MDX vs. SQL Queries
To test query response times, we ran sets of similar queries in different environments: SQL queries against SQL Server, and MDX queries against the MOLAP cube stored on the OLAP server. Although you can't exactly compare execution times for SQL and MDX queries, the results clearly show that using MDX for OLAP Services can significantly boost query performance because OLAP Services precalculates and stores aggregates in the OLAP cubes.

For questions 2, 3, 4, 6, and 7, responses for the MDX query ranged within several seconds, whereas responses for the SQL query were within a hundred seconds. For questions 1 and 5, MDX queries took somewhat longer to execute, probably because these queries reference a distinct count calculated measure in the cube. (A distinct count calculated measure determines, for each member of a dimension, how many distinct, lowest-level members of another dimension share rows in the fact table. If a member occurs multiple times, it is counted only once.)

Queries 1 and 5 use a distinct count to determine the number of unique households that generated economic income for a given product. The MDX statement in the MDX query then uses this measure to calculate an average economic income. Listing 1 shows a sample.

In this SQL Server 7.0 test scenario, we used an MDX expression to implement a distinct count with a calculated member, which can greatly affect the performance of queries that use the distinct count calculated member. Because calculated members are not preaggregated, OLAP Services must perform these calculations at query execution time. Additionally, OLAP Services must read all lowest-level members to calculate the distinct count. MDX queries 1 and 5 reference a distinct count of the HouseholdDim dimension, which contains 200,000 members. Accessing this number of members, as well as computing the distinct count, can explain the increased query response time with a cold cache. (A cold cache contains either no data or no relevant data.)

In SQL Server 2000, distinct count is implemented as a standard aggregate function defined for a measure, such as sum, count, min, or max. Implementing a distinct count in SQL Server 2000 yields much better performance than implementing a calculated measure, and the distinct count performs at levels similar to the other aggregate functions. We tested processing times by executing a set of queries that included both SQL queries on SQL Server and MDX queries on OLAP Services using MOLAP cube design. Table 7 displays the results. Listing 2 shows an MDX query that is typical of those we used in the testing. In this example, we formatted the MDX query to answer the business question "What is the economic income for first quarters of 1996 and 1997, and how does it compare for these two periods by each customer segment/product?" Table 8 shows the results. Despite its complexity, the query executed in less than 1 second.

MDX query execution times for MOLAP, ROLAP, and HOLAP. Having established the relative processing times for MDX versus SQL queries, we compared the execution times for the three OLAP storage modes. To conduct the test, we developed MDX queries for the seven identified business questions. We executed the queries against various levels of aggregation by using MOLAP, ROLAP, and HOLAP. We also conducted tests against cold and warm caches. (A warm cache contains relevant data from previously executed queries.) We created a cold cache condition by restarting the server before executing the query. To achieve a warm cache, we executed a query and then immediately executed again.

The average query processing time for all storage types changes significantly for warm versus cold caches. Also, after the first execution, subsequent queries take significantly less time because the query data is already in cache.

You would expect the tests that you execute against a cold cache to show that ROLAP and HOLAP processed queries more quickly as the aggregation level increased from 0 to 30 to 60 percent. But as the level of aggregation increased from 60 to 90 percent for ROLAP and HOLAP, queries didn't perform significantly better. MOLAP was fastest against a cold cache, though increasing the levels of aggregation for MOLAP had little effect on performance.

The warm cache tests produced significantly different results. Whereas query times improved substantially for all storage types as the aggregation level increased from 0 to 30 percent, increasing aggregation levels from 30 to 60 to 90 percent had little effect on total query performance.

CPU usage during MDX querying. In addition to execution time, CPU use is often crucial for large databases. We monitored the CPU on each server, and Table 9 shows the results. Our tests show that MOLAP does most of its processing on the machine holding OLAP Services, making little or no use of the machine holding the SQL Server RDBMS data mart. The small amount of usage in Table 9 could have resulted from server noise.

However, ROLAP and HOLAP use more processor time from the RDBMS server and a small amount from the OLAP Server machine, perhaps because our test queries involved a distinct count. (Calculating a distinct count requires OLAP Services to read all lowest-level household dimension member information from the RDBMS.) Also, additional system activity might have occurred on the RDBMS during the test.

Performance Optimization Tips
During the study, we experimented with various database components and characteristics and uncovered some techniques that significantly improve OLAP cube processing time and query times. We first recommend using a dimensional schema for the data mart. A star schema model dovetails nicely with the dimensional OLAP cube design and performance.

You can also improve cube processing time by using keys and indexes in different ways. We recommend that, besides associating a key with each dimension and fact table in the data mart, you declare foreign key relationships between fact and dimension tables, create a composite index on all foreign keys in the fact table, and create indexes on individual foreign keys.

Use the Optimize Schema feature in the OLAP Services Cube editor to minimize the number of joins that OLAP Services performs while processing a cube. Specify the maximum process buffer size in the OLAP Server Properties dialog box.

You can optimize MDX query execution times in three ways. First, put OLAP Services (the OLAP cubes) and SQL Server (the data mart) on separate machines, and be sure that the OLAP machine has adequate memory for OLAP Services. Second, if you use large data sets, partitioning can significantly improve query execution results. Finally, our tests show that MOLAP produces the best query performance.

The aggregation level you select affects cube processing time and query performance. Monitor your system's performance and adjust aggregation levels accordingly. We suggest that you start at about 25 percent, then increase the aggregations as needed. If you need to build additional aggregations, use the Usage-Based Optimization Wizard, which helps you create additional cube partitions based on the history of queries you previously sent to the cube.

By carefully applying the knowledge and techniques presented here to create and use OLAP cubes, you can build a database of almost any size that doesn't explode when you add data, that can be maintained efficiently, and that users can query quickly to produce meaningful results. (For more information about evaluating your database, see the sidebar "Using the FASMI Test.") In short, you can achieve the success that has until now largely eluded large-scale database designers.



ARTICLE TOOLS

Comments
  • Veara
    5 years ago
    Feb 07, 2007

    well written

  • suhas alatekar
    8 years ago
    Jun 10, 2004

    Exllent comparisons & useful for me to complete P G engineering work
    will be grateful if u help me by sending performance of multidimensional databases on various platforms

You must log on before posting a comment.

Are you a new visitor? Register Here