Executive Summary:
Dimensional modeling is made up of both logical and physical modeling. Measures are the core of the dimensional model and are data elements that can be summed, averaged, or mathematically manipulated. Fact tables are at the center of the star schema (which is a physically implemented dimensional model), and data marts are made up of multiple fact tables.
|
In "Discover the Star Schema," July 2007, InstantDoc ID 96112, I reviewed the
basics of dimensional modeling. You can use dimensional modeling to create an
enterprise data warehouse that makes it easier for business intelligence (BI)
end users to access and understand data. Now let's examine the differences between
entity relationship (ER) and dimensional models, the components that make up
the dimensional model, and why those pieces are designed the way they are.
Dimensional vs. ER Modeling
Dimensional modeling is a logical design technique. Unlike ER modeling, which
consists of conceptual, logical, and physical data modeling, dimensional modeling
is made up of only logical and physical modeling.
There are sharp contrasts between ER and dimensional modeling. ER modeling
is a design discipline that seeks to represent business rules as highly detailed
relationships between business elements that are materialized as database tables.
You can extrapolate the business rules from the types and cardinalities of the
relationships in an ER model. The primary goal of ER modeling is to remove all
non-key data redundancy.
Dimensional modeling, however, seeks to represent data in a logical, understandable
manner. In dimensional modeling, you can control data redundancy by conforming
dimension and fact tables. A table that's been conformed can be used in more
than one dimensional data model and is the same no matter how it's used. The
relationships in a dimensional model don't represent business rules; instead,
they're navigational paths used to help write reports or create graphs.
Many data modeling software packages support dimensional modeling. Some even
let you generate SQL Data Definition Language (DDL) scripts so that all you
have to do to create a data warehouse/data mart is run those scripts.
Determining the Facts
In the world of BI, data elements that can be summed, averaged, or otherwise
mathematically manipulated are called measures. A dimensional model is
designed to deliver numbers to BI users, and measures are the core of the model.
In Figure 1, the fact table Reseller_Sales
contains both measures and keys. Measures are data elements such as OrderQuantity,
DiscountAmount, and TaxAmount that are used to determine all types of statistical
information, such as the percentage of sales that were discounted or the total
sales tax collected per region. Like foreign keys in an ER diagram (ERD), keys
in a dimensional model are expressions of a relationship to a dimension table.
In contrast to foreign keys in an ERD, however, the relationship isn't intended
to enforce referential integrity; after all, the data has already been through
the extraction, transformation, and loading (ETL) process, having been scrubbed
and validated before being loaded into the data warehouse. Instead, the relationship's
function is to associate keys in the fact table with the expanded definitions,
which are found in the dimension tables.
In a data warehouse environment, you
rarely retrieve just one record. Typically,
hundreds, thousands, or even millions of
records are retrieved in a single query, and
the most logical thing to do with record
sets that large is to perform a mathematical process on their data. That's why the
section of a fact table that contains the
numeric and additive columns is arguably
the most important section.
You can start your dimensional design by reviewing the ER model of the transactional
data source (the operational database). You can usually identify potential fact
tables by locating the associative tables that represent the many-to-many (M:N)
relationship in an ER model. An ER model will break down into multiple dimensional
diagrams: The number of diagrams is determined by the number of functions in
the organization and the organization's BI reporting needs.
To create a dimensional model from an ER model, first separate the ER model
into its discrete business processes. Each business process can be expressed
as a data mart—a modular, highly focused, richly detailed, incrementally
designed component of the enterprise data warehouse. Although there are many
ways to approach and implement an enterprise data warehouse, the data mart approach
lets you tackle the data warehouse project business-process-by-business-process
and produce deliverables for your user community.
If this is your first excursion into dimensional modeling, start with a single-source
data mart; don't try to tackle multiple-source data marts until you've acquired
the skill set necessary to design a complex data warehouse. Examples of single-source
data marts include retail sales, purchase orders, shipments, and payments. An
example of a multiple-source data mart is customer profitability, which combines
revenue and costs that often come from separate transactional sources (e.g.,
the sales database and the inventory database, respectively).
For each preliminary data mart, identify the M:N relationships from the transactional
model that are comprised of numeric and additive non-key data. These relationships
will most likely be the associative tables in the ER model. Designate these
as fact tables. It's not unusual for a large ER model to produce a dimensional
model that has from 10 to 25—or even more—fact tables.
Prev. page  
[1]
2
next page