SideBar    Modeling Myths

Managing Changes in Dimension Attributes
Even though we think of dimension attributes as fixed properties (e.g.,a product's color), some attribute values can change over time. We call this Slowly Changing Dimensions. Dimensional modeling offers two basic techniques for tracking attribute changes in dimensions. The first, a Type 1 change, is to simply overwrite the old value with the new one. The second way to track attribute changes, a Type 2 change, is a bit more complex. When a Type 2 attribute changes, you create a new dimension row with the new value and a new surrogate key, and mark the old row as no longer in effect. Going forward, the fact table uses the new surrogate key to link new fact measurements to the dimension row that contains the new attribute value. Fact rows created before the change will continue to join to the old dimension row with the old key and return the old attribute value. With a Type 2 change, all measurements before the change will be associated with the old value, and all measurements after the change will be associated with the new value.

We often see the technical people on BI teams dismiss Type 2 dimension-change management as too complicated. Fortunately,SQL Server 2005 Integration Services-(SSIS) makes managing Type 2 dimensions relatively easy. It's essential that business users—not developers—decide which attributes must have historical tracking. For some analyses, it's vital for the business user to know an attribute's value at the time an event happened rather than the value that the attribute has today.

Bringing Together Facts and Dimensions
A business process dimensional model brings together the facts and dimensions for a given business process. For example, Figure 1's OrderLineItems fact table involves three dimensions: a dimension for the customer who placed the order, the order date, and the product the customer ordered. Each of these dimensions is independent of the business process. For example, a product can exist in inventory even if it never sells. Figure 3 shows the business process dimensional model for the OrderLineItems business process from Figure 1. (You might also see the term star schema used to describe the business process dimensional model; the terms are essentially interchangeable.)

The OrderDate and RequestedDate dimensions in Figure 3 are actually the same underlying dimension. We call this reuse of the same dimension in one model dimension role-playing. The underlying Date dimension exists in almost all business process dimensional models and can be reused through views, aliases, or physical copies of the base table. The Date dimension supports analyses that trend across dates or compare time periods (i.e., nearly all business analyses). Analysis Services 2005 calls this a Time dimension and provides a wizard and MDX functions to help you create date-based calculations.

Although a business-process dimensional model such as the orders model in Figure 3 is powerful from an analytic perspective, this power is dramatically multiplied when you extend it across the enterprise. Many dimensions are involved in multiple business processes. For example, the Product dimension at a company such as AdventureWorks Cycles is tied not only to orders, but also to manufacturing, inventory, shipping, and returns. These separate business-process dimensional models must use the same product dimension with the same keys to reliably support true, cross-business-process analysis. If your company's inventory managers want to calculate inventory turns, they sum the total sales quantity by product from orders, sum the inventory quantities at the end of the period by product from inventory, and divide the two to get inventory turns by product.This type of analysis works only if the two business processes use the exact same product dimension and the same keys; that is, they use a conformed dimension. Conformed dimensions are the cornerstone of the enterprise-enabled BI system.

Getting all the people in an enterprise to agree on conformed dimensions is an organizational challenge for the data modeler and data steward.The organization must agree on how to define its product,customer,employee, or facility dimension tables. The definition includes the list of attributes, attribute names, hierarchies, and the business rules needed to define and derive each attribute in the table. Getting this enterprise-wide agreement can be hard, political work, and the effort grows with the number of employees and divisions in your organization, but it's not optional. Creating conformed dimensions is the only way to ensure that you're comparing apples to apples (assuming you are selling apples).

The idea of reusing dimensions across multiple business processes leads us to the concept of the data warehouse bus matrix, which is a useful way to map out conformed dimensions and the business processes they support. The classic bus matrix has dimensions across the columns and business processes down the rows. Table 1 shows an abbreviated version of the AdventureWorks Cycles bus matrix, which gives us the sense that the Orders business process is more complicated than we've been led to believe. The bus matrix includes a Promotion dimension and an Employee dimension in addition to the Date, Product, and Customer dimensions we've already seen.

Ultimately, the bus matrix is the enterprise-BI data roadmap. You implement the BI system one business process at a time. As you implement a business process, build conformed dimensions that will support other business processes as they come online. In Analysis Services 2005, you see the same concept as the data warehouse bus matrix in the Dimension Usage tab of the Cube Designer.

A Dimensional Approach for SQL Server 2005
Given that SQL Server is so dimensionally oriented, what should your data warehouse and BI data architecture look like? We recommend creating a three-tiered data solution like the one that Figure 4 shows. This solution includes a staging or extraction, transformation, and loading (ETL) database; a relational dimensional database; and an OLAP cube in Analysis Services.

The ETL database is the construction site for the BI system. Business users generally aren't allowed access to this first tier. Within the ETL database, you manage slowly changing dimensions, assign surrogate keys, integrate data from multiple sources, and deal with business rules, data quality checks, and complex transformations. The underlying data model of the ETL database depends on many factors, including data volumes, referential integrity issues, data quality, the source system data model, and source system extract resources. For example, the ETL database might contain normalized versions of the dimension attributes and hierarchies to support the construction of the relational dimensional database and the Analysis Services database. Specifically, when Analysis Services builds a dimension, it issues a SELECT DISTINCT query against every attribute. Running these queries can be time-consuming for a large dimension table with dozens of attributes. This re-normalization is called snowflaking, and although it can be useful for maintenance and data load purposes, we encourage you to always present properly formed, denormalized dimensions to your users.

The second tier, the Relational Database in Figure 4, is a user-accessible, relational, atomic-level database built on dimensional models. This tier serves as the data foundation and is the cleansed, aligned, validated, atomic-level detail system of records for the data warehouse and BI system. The relational platform lets you store and manage the full data set. It also provides a level of ad hoc flexibility for certain kinds of queries (e.g., joins to new tables not in the original model) that are difficult to do in Analysis Services.

For the third tier, you build an OLAP cube in Analysis Services that replicates the relational dimensional database and functions as the primary user-query engine. Our practice is to build this tier as a MOLAP cube.The Analysis Services layer provides consistent performance by pre-aggregating, indexing, and compressing the data and delivers enhanced analytic functionality by using the MDX query language. Keeping the detailed data foundation in relational form lets you manage, redesign, and rebuild the Analysis Services layer when the need arises.

Prev. page     1 [2] 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE