These test results reveal concrete steps you can take to improve multidimensional database performance
For many IT organizations, especially those with larger databases, optimizing database performance is a mystery, a discipline that relies more on anecdotal remedies than on proven methodologies. We recently conducted an extensive series of performance optimization tests against an internal large-scale database; our goal was to develop reliable methodologies for improving OLAP cube performance. We designed the tests to evaluate various optimization methods, and our results reveal concrete steps you can take to enhance the performance of multidimensional databases.
When creating cubes with SQL Server 7.0 OLAP Services, your two most important design decisions are the storage mode and the aggregation level. For designers making basic choices about data warehouse design, Microsoft SQL Server 7.0 documentation provides adequate guidelines and explains the trade-offs. But little of the available information is based on direct experience of how storage modes and aggregation levels affect processing and query performance.
We describe the tests we conducted to isolate problems with cube performance. We review the study constraints, explain how we constructed the OLAP cubes, and examine the query process. Our analysis of cube processing times and cube storage, including a comparison of MDX language and SQL queries, illustrates the range of possible methods. Finally, we give you performance optimization tips derived from the tests. We have fully described our testing and our results in the white paper "Optimizing OLAP Performance for Very Large Databases" (which you can find at http://www.unisys.com/sql7/).
The Challenge of Cube Performance
We began by setting up representative SQL Server and OLAP databases as test beds. We identified sample business questions to run against an existing very large database (VLDB), then isolated the tables and fields that contained the information.
Next, we created and populated a SQL Server data mart based on a star schema design. We then used three OLAP storage modes and multiple aggregation levels to create various representative OLAP cubes. Finally, we created SQL and MDX queries based on the business questions and ran the queries to compare the data mart with the OLAP cubes.
During the study, we investigated several variables that influence performance optimization, including different storage modes and various aggregation levels. We've included comparative test results for cube processing times, disk space requirements, CPU usage on a relational database management system (RDBMS) server versus OLAP Server while querying, MDX query execution times, query execution times with warm and cold caches, disk space required to build OLAP cubes versus the star schema data mart, and MDX versus SQL query execution times.
Study Details
For its ongoing research and development efforts, Unisys maintains the Unisys e-@ction Customer Profitability SQL Server 7.0 VLDB for retail banking data. At more than 2TB, this is the largest commercial database deployed with SQL Server 7.0 in an enterprise-class environment using Windows NT. The database contains data representing 2 years of transactions by 4 million banking customers with 10 million accounts.
We conducted our study by using the bank VLDB's development version database (approximately 130GB) and designed a star schema for the business process represented in the OLAP cubes. This model let users query a cube based on a bank account profitability fact table containing approximately 13 million records, covering information about economic income earned from January 1996 through December 1997 (in 24 periods) from various products and customer segments.
The test system hardware consisted of two similarly configured Unisys e-@ction ES5045 R servers, with four Pentium III Xeon 450MHz CPUs. We configured each server with a 512MB cache, 8GB of RAM, and fibre channel data storage OSM7700 and RAID 10 volume (5 mirrored sets and striped data across those sets). We connected the servers through a 100MB Ethernet network.
One server housed the relational database and the star schema called VLDBMart. We configured the second server to store the OLAP cubes and multidimensional data.
Setting Up the Study
We took several steps to build our test configuration. These steps include defining questions and describing bank data, creating a data mart, populating the data mart, and building OLAP cubes.
Define questions and describe bank data. First, we defined the appropriate and relevant business questions, including economic issues such as income, spread, and moving averages for various combinations of time periods across products and customer segments. We chose seven questions that this type of customer profitability system would typically answer, such as "What is the average economic income per household for the past 2 years (1996 and 1997) from each of the products?" Next, we examined the existing VLDB entity relationship diagram (ERD) to determine which tables held the information required to answer the questions.
Create a data mart. We then created a data mart with the table information required to answer the profitability questions. We chose a star schema because this design closely matched the design of the OLAP cubes we planned to construct.
To construct the required data mart, we defined a fact table called Account_Prof_Fact to hold the data from the account profitability tables in the banking database and to contain data for all 24 periods in the study (January 1996 through December 1997). The fact table records account profitability information such as economic income and expenses for various products at a monthly snapshot level. This snapshot contains monthly totals for all measures. Having defined the fact table, we identified five relevant dimensions: product, time, region, household, and customer segments. Figure 1 shows the resulting star schema.
Populate the data mart. After completing the data mart design, we used Data Transformation Services (DTS) to populate the Account_prof_fact fact table and the associated dimension tables in the data mart. After we populated the fact table, it held approximately 13 million records. Table 1 shows the records and size for the fact and dimension tables in the data mart.
Build OLAP cubes. To construct the OLAP cubes, we created a multidimensional OLAP database, which we named AccountProfitabilityOLAPDatabase. Within that database, we generated 12 cubes with identical structure but varying storage types and aggregation levels.
We chose aggregation levels of 0, 30, 60, and 90 percent for the cubes because these levels represent the range used in most applications. The aggregation percentage represents the expected improvement in query performance compared with performance without precalculated aggregations.
Figure 2 shows the cube structure for one of the cubes we designed, AccountProfitabilityCubeM90. Next, we developed the fact measures for the cubes. To accurately measure each account's true profitability, we defined eight fact measures for the cubes. To complete the cube structures, we selected five cube dimensions corresponding to the dimension tables selected for the data mart's star schema. Table 2 provides details about the defined OLAP dimensions.
Cube Processing and Storage Results
Before testing our queries, we compared processing times and disk space requirements for each storage typei.e., Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), and Relational OLAP (ROLAP). And we compared disk space requirements for MOLAP cubes with space requirements for the star schema.
Prev. page  
[1]
2
next page