SQL Server 2005, and especially Analysis Services 2005, has embraced industry-standard dimensional modeling like a long-lost friend. When you build Analysis Services databases in SQL Server 2000, you have to compromise in creating the dimensional model. Specifically, you have to create multiple OLAP virtual dimensions and use member properties to capture attributes that don't fit into hierarchical dimensions. SQL Server 2005 Analysis Services uses attribute-based dimensions, so every attribute in a dimension is automatically treated as a standalone hierarchy. Now, you can use the metadata layer that defines Analysis Services cubes (the Unified Dimensional Model—UDM) to define one customer dimension instead of a half-dozen artificially separate dimensions. One benefit of this deeper dimensional support is that data warehouse and BI system developers no longer need to convert standard dimensional-modeling techniques to the limited world view of previous SQL Server versions. Instead, you can build dimensions that realistically represent how your business operates and can change as your business changes. In this article, we define dimensional models, describe the basic elements and techniques that support them, and offer a dimensionally based data architecture for your data warehouse and BI system.
Facts and Dimensions
What do we mean by dimensional modeling? Basically, a dimensional model contains two entity types: facts and dimensions. We keep the measurements of a given business process in entities called fact tables in the relational world, or measure groups in Analysis Services. In its classic form, a relational fact table contains foreign keys and numeric fact fields that measure the business process. Essentially, a fact table is a normalized structure for efficient storage and fast access of large, relational data sets.
For example, many organizations have a business process called Orders that tracks customer requests for goods or services. Figure 1 shows a simplified OrderLineItems fact table model based on the Adventure-Works Cycles sample database in SQL Server 2005. The three non-key numeric fields in the fact table are the quantities, dollar values, and shipping costs of goods ordered: the measurements of the Orders business process. The foreign keys in the OrderLineItems fact table link to the second type of dimensional entity, dimension tables, which we'll discuss in a bit.
The level of detail in the fact table is called its grain. The granularity of the OrderLineItems fact table in Figure 1 is at the order line-item level.The finer the grain, or the lower the level of detail, the more flexibility your model will have. You can always roll the data up to a higher level if needed, but you can never drill down if you don't have the detail in your initial database. Our team designs fact tables at the lowest level of detail available, the atomic level. Depending on your organization's data volumes and your users' performance needs, you might not be able to provide atomic-level granularity, but make sure business users understand what they're losing in terms of flexibility.
You'll most likely end up with fact tables at different levels of granularity in your BI environment. For example, in an Orders Forecast table, you wouldn't want the same level of granularity as the OrderLineItems fact table. Forcasting orders by day, customer, and product is just too much work. Instead, the Orders Forecast table would probably be at the month and product level, with a customer level possibly thrown in for large customer accounts or businesses that have relatively few customers. The OrderLineItems table in Figure 1 has data that could be at two different levels. The OrderQuantity and OrderDollar values are at the line-item level, but the Shipping-Amount might be generated at the Orders table level. In Figure 1, we've allocated shipping costs to the line-item level to avoid the need for two fact tables. Such an allocation is generally a good solution as long as the business users support the allocation rules. In some cases, it doesn't make business sense to allocate items this way, and instead we use fact tables at two levels of granularity to describe a given business process.
One final note about fact tables: Notice that the OrderNumber and LineItemNumber in Figure 1 are not foreign keys for separate dimensions. Their purpose is to tie together the rows of each order. Because all the descriptive attributes of an order reside in their respective dimensions, Order-Number and LineItemNumber become degenerate dimensions (or fact dimensions in Analysis Services).
A quick glance at Figure 1's example data rows reveals that the fact table is meaningless by itself—it has no business context. Any given business process has several relatively independent entities or objects that participate in the process. In dimensional modeling, these objects are called dimensions and the properties of a dimension are called attributes. Attributes describe the members of the dimension and provide a rich business context for analytic purposes. For example, the AdventureWorks Product dimension might have several standalone attributes such as color, size, and weight. Most dimensions also have several attributes that relate to other attributes in a hierarchical fashion.The product dimension might have a hierarchy in which products are members of a group and groups are members of a category. Figure 2 shows a simplified Product dimension and some example rows for Adventure WorksCycles.
In the transaction system, attributes and hierarchies are often fully normalized into separate tables that use foreign keys to link to products. We build dimension tables on the relational platform by denormalizing the descriptive attributes and hierarchies into one table for each dimension. These denormalized dimensions still present the same information and relationships as the normalized model; nothing is lost except complexity.
Even with a simple product dimension like the one in Figure 2, it's easy to imagine how the attributes will be used as row and column headers in a query or report. For example, a user request to see order dollars by color and size would reveal order patterns between size and color. All of the natural ways end users describe their business should find their way into dimension attributes.
Figure 2's Product dimension seems to have two keys: ProductKey and ProductBusinessKey. ProductKey is the dimension table's primary key and is a surrogate key, usually an integer, which is added as part of the process of creating the dimension table. ProductBusinessKey is the key from the transaction system. Although including both keys might seem redundant, there are several compelling reasons to assign and manage your own surrogate keys:
- They help isolate the data warehouse and BI system from operational changes such as a merger, acquisition, or system migration.
- You can add dimension rows for values such as Not Applicable or Date TBD that aren't in the transaction system.
- Surrogate keys let you integrate data from multiple sources that might have different business keys for the same entity, such as a customer.
- In some cases, the simple integer surrogate key provides better join performance than a more complex alphanumeric key.
If those reasons aren't enough, you must use a surrogate key if you're planning to track changes in attribute values over time.
Prev. page  
[1]
2
3
next page