Aggregation Answers
A primary reason for developing cubes is to create pre-calculated aggregations. Aggregations are summarizations of data at any level higher than the grain, which is the lowest level of detail in the fact table. If you record sales per customer per product per day, the total sales per customer for a day or month or quarter would each represent different aggregations (e.g., total sales for Customer X by day or total sales for Customer X by quarter). In addition, the total sales for all customers for a day or month by product subcategory or product category also represent aggregations.
Considering the variety of attributes and levels of hierarchical information available, one of the most straightforward ways to enhance cube performance is to pre-calculate and store these aggregations. However, cubes built in BIDS don’t include any higher-level aggregations by default. Unfortunately, some businesses go through the entire deployment process without ever creating any aggregations. The Aggregation Wizard organizes aggregations based on either a percentage of total aggregation, storage size, or simply the point at which the designer tells the wizard to stop. The wizard’s approach is the easiest way to develop aggregations; however, the wizard assumes that all queries are equally likely.
One common approach to handling aggregations is to let the wizard create a low percentage of aggregations, such as 20 percent. Then, after the cube is running in production, you can log queries and later run through the Usage Based Optimization (UBO) wizard. The UBO wizard looks at the actual queries and designs aggregations that support them, meaning that aggregations become far more tuned to actual usage patterns. Note that for large cubes, the UBO in SSAS 2005 didn’t always provide the best aggregations; the SSAS 2008 UBO does a much better job.
Another problem you want to avoid is having too many aggregations. Queries can actually suffer as the engine searches through a host of aggregations to try to determine which is best at fulfilling a particular question. Some DBAs use the Aggregation Wizard and set it to 100 percent, which tends to slightly improve performance while dramatically increasing physical storage requirements and processing time. Additionally, some database pros fashion custom aggregations because the wizard, even at 100 percent, doesn’t create every possible aggregation. SSAS 2005 includes a sample application that provided for the tuning of aggregations, and a similar tool is built into BIDS 2008. Some database pros erroneously assume that if one aggregation is good, thousands must be better, and so they produce every possible aggregation. Not only does this require vast amounts of storage, but it can also stretch processing times into hours or even days.
Processing Options
Suppose you need to keep the cube available at all times, but it sometimes goes offline for processing. To minimize processing time, you can create attribute relationships and tune the number of aggregations to improve query times while still allowing for processing in the allotted time window.
Commonsense tips for solving the problem include reducing the number of attributes in the dimensions and partitioning the data. You can also tune specific processing settings. The Microsoft SQL Server 2005 Analysis Services Performance Guide (which you can download at http://www.microsoft.com/sqlserver/2005/en/us/analysis-services.aspx) covers many server settings you can tune, such as ThreadPool\Process\MaxThreads and OLAP\Process\BufferRecordLimit. You can also examine a simple setting in BIDS and SQL Server Management Studio, which is available via the Process dialog box. A Change Settings button just above the Run button opens the Change Settings dialog box that Figure 6 shows. The default Processing Order is Parallel, and Maximum parallel tasks is set to Let the server decide. Changing this value can result in better (or worse) performance. Although a number of values populate the list, you can also manually enter a value, using a trial-and-error approach.
Data Source Login Hassles
Developers often create data sources that connect to the database with Windows authentication but set the impersonation information to a service account. Such an account might not have the appropriate permissions to access the data, especially when the project is moved from development to test to production. Worse, data connections are sometimes made with hard-coded usernames and passwords in them, which works well until the password changes. Normally this happens after midnight on a weekend, causing a developer to spend hours updating all the information.
The best practice for connecting to data sources is to use Windows authentication, then impersonate using a domain account created specifically for accessing the source data. This account should have read-only access to the data, except when using data-mining models or writeback.
Data-Delivery Snafus
Perhaps the most common problem with the use of SSAS isn’t a specific design issue involving the cubes but rather the inability to provide the proper tool mix needed to deliver data from the cubes to the organization. No single tool is right for everyone; typical solutions include at least a reporting tool (such as SQL Server Reporting Services) and an analytics tool (such as Microsoft Excel or Microsoft ProClarity Desktop Professional). Many solutions also include a scorecard tool (such as Microsoft Office PerformancePoint Server 2007). Organizations pursuing business intelligence projects should determine how to deliver data and to whom it should go before designing the relational data warehouse and cubes. Knowing how you deliver data may drive some cube design decisions involving perspectives, key performance indicators, and more.
Cubing for Keeps
I’ve seen the problems detailed in this article regularly in a variety of companies with varying levels of experience in creating cubes with SSAS. It’s clear that dimensions cause the greatest number of issues—failing to create attribute relationships correctly, neglecting to hide attributes used in hierarchies, and improperly setting attribute keys. Cubes also lead to hassles when you relate measure groups and dimensions, deal with unrelated dimensions, and create aggregations. Finally, many IT organizations don’t optimize settings for processing and query performance and don’t provide end users with adequate tools to let their enterprises take full advantage of cube data. If you follow the recommendations in this article, you have a better chance of creating a usable and high-performance set of cubes.