Executive Summary:
Dimensional modeling is made up of both logical and physical modeling. Measures are the core of the dimensional model and are data elements that can be summed, averaged, or mathematically manipulated. Fact tables are at the center of the star schema (which is a physically implemented dimensional model), and data marts are made up of multiple fact tables.


In "Discover the Star Schema," July 2007, InstantDoc ID 96112, I reviewed the basics of dimensional modeling. You can use dimensional modeling to create an enterprise data warehouse that makes it easier for business intelligence (BI) end users to access and understand data. Now let's examine the differences between entity relationship (ER) and dimensional models, the components that make up the dimensional model, and why those pieces are designed the way they are.

Dimensional vs. ER Modeling
Dimensional modeling is a logical design technique. Unlike ER modeling, which consists of conceptual, logical, and physical data modeling, dimensional modeling is made up of only logical and physical modeling.

There are sharp contrasts between ER and dimensional modeling. ER modeling is a design discipline that seeks to represent business rules as highly detailed relationships between business elements that are materialized as database tables. You can extrapolate the business rules from the types and cardinalities of the relationships in an ER model. The primary goal of ER modeling is to remove all non-key data redundancy.

Dimensional modeling, however, seeks to represent data in a logical, understandable manner. In dimensional modeling, you can control data redundancy by conforming dimension and fact tables. A table that's been conformed can be used in more than one dimensional data model and is the same no matter how it's used. The relationships in a dimensional model don't represent business rules; instead, they're navigational paths used to help write reports or create graphs.

Many data modeling software packages support dimensional modeling. Some even let you generate SQL Data Definition Language (DDL) scripts so that all you have to do to create a data warehouse/data mart is run those scripts.

Determining the Facts
In the world of BI, data elements that can be summed, averaged, or otherwise mathematically manipulated are called measures. A dimensional model is designed to deliver numbers to BI users, and measures are the core of the model. In Figure 1, the fact table Reseller_Sales contains both measures and keys. Measures are data elements such as OrderQuantity, DiscountAmount, and TaxAmount that are used to determine all types of statistical information, such as the percentage of sales that were discounted or the total sales tax collected per region. Like foreign keys in an ER diagram (ERD), keys in a dimensional model are expressions of a relationship to a dimension table. In contrast to foreign keys in an ERD, however, the relationship isn't intended to enforce referential integrity; after all, the data has already been through the extraction, transformation, and loading (ETL) process, having been scrubbed and validated before being loaded into the data warehouse. Instead, the relationship's function is to associate keys in the fact table with the expanded definitions, which are found in the dimension tables.

In a data warehouse environment, you rarely retrieve just one record. Typically, hundreds, thousands, or even millions of records are retrieved in a single query, and the most logical thing to do with record sets that large is to perform a mathematical process on their data. That's why the section of a fact table that contains the numeric and additive columns is arguably the most important section.

You can start your dimensional design by reviewing the ER model of the transactional data source (the operational database). You can usually identify potential fact tables by locating the associative tables that represent the many-to-many (M:N) relationship in an ER model. An ER model will break down into multiple dimensional diagrams: The number of diagrams is determined by the number of functions in the organization and the organization's BI reporting needs.

To create a dimensional model from an ER model, first separate the ER model into its discrete business processes. Each business process can be expressed as a data mart—a modular, highly focused, richly detailed, incrementally designed component of the enterprise data warehouse. Although there are many ways to approach and implement an enterprise data warehouse, the data mart approach lets you tackle the data warehouse project business-process-by-business-process and produce deliverables for your user community.

If this is your first excursion into dimensional modeling, start with a single-source data mart; don't try to tackle multiple-source data marts until you've acquired the skill set necessary to design a complex data warehouse. Examples of single-source data marts include retail sales, purchase orders, shipments, and payments. An example of a multiple-source data mart is customer profitability, which combines revenue and costs that often come from separate transactional sources (e.g., the sales database and the inventory database, respectively).

For each preliminary data mart, identify the M:N relationships from the transactional model that are comprised of numeric and additive non-key data. These relationships will most likely be the associative tables in the ER model. Designate these as fact tables. It's not unusual for a large ER model to produce a dimensional model that has from 10 to 25—or even more—fact tables.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

great clarity

jjderoo

Article Rating 4 out of 5