Executive Summary:
A bus architecture is composed of a set of tightly integrated data marts that get their power from conformed dimensions and fact tables. A conformed dimension is defined and implemented one time, so that it means the same thing everywhere it's used. Fact tables that should be conformed include those that derive revenue, profit, standard prices, and standard costs.
|
If you’ve been reading my articles about data warehousing
the past few months, you know something
about facts and dimensions and how they form a
dimensional model. But you’re probably wondering how you
can create a data warehouse from facts and dimensions.
I believe it was Ralph Kimball (or someone in his
organization) who coined the phrase bus architecture for
the enterprise data warehouse. Microsoft uses the term
Unified Dimensional Model (UDM) for a concept that’s
similar to the bus architecture, but the UDM goes beyond
the star schema.
A bus architecture uses top-down planning and a grid
of business functions and dimensions to deliver a set of
tightly integrated data marts. To get a better idea of how
a bus architecture works, picture an electrical bus—a
structure that everything else connects to and draws power
from—such as a big electrical cable.
The data warehouse bus architecture is composed of a
set of tightly integrated data marts that draw their “power”
from a common set of conformed dimensions and facts.
A dimension table is the “lookup” table of a dimensional
model; it contains textual data that decodes an identifier
in associated fact tables. A conformed dimension is defined
and implemented one time and used throughout the multiple
star schemas that make up the enterprise data mart.
Dimensions define the who, what, where, when, why, and
how of a situation, and are laid out for the benefit of business
users. Figure 1, is an example of a portion of
a bus architecture. It contains two fact tables (i.e., RETAIL
SALES, SHIP FROM MANUFACTURING) that might
be created from the manufacturing value chain, and the
conformed dimensions that modify both fact tables.
To conform a dimension, every stakeholder must agree
on a common definition for the dimension, so that the
dimension means the same thing no matter where it’s used.
For example, instead of having a time dimension whose
granularity is expressed in weeks, quarters, and years for
some dimensional models and in days, months, quarters,
and years for other models, you would have one conformed
dimension that’s used wherever a time dimension
is required in the enterprise data warehouse. Keep in mind
that the conformed dimension’s structure must reflect the
finest grain that might be needed for any fact table.
So how do you associate a dimension to a fact table
more than once, such as in Arrive Time and Depart
Time in Figure 1? The answer is by using views. Make an
Arrive Time view that contains as many or as few of the
attributes in the TIME dimension as necessary to express
arrival times for the SHIP FROM MANUFACTURING
fact table. You can see that I’ve used views of dimensions
(i.e., Depart Time, Arrive Time, To Location, From Location,
Buyer, Seller) quite extensively in Figure 1. When a
single dimension needs to appear several times in a fact
table, we use data warehouse roles.
Once the dimensions have been conformed, you
can conform the fact tables, which you’ll want to do
immediately after conforming the dimensions while your
understanding of the meaning and use of these fact
tables is fresh. Conforming fact tables makes more sense
if you picture a manufacturing value chain. For example,
say you want to write a report that flows down from the
manufacturing data mart to the retail point of sale (POS)
data mart. A fact table stores measures; the best way to
measure product at the manufacturing point is by cases
shipped, and the best way to measure product at the POS
is by units scanned (aka sold). How do you reconcile these
two seemingly incompatible measures?
To do so, conform the fact tables by selecting and
storing the finest grain of a measure. If the POS fact
table measures product by individual unit, then the
manufacturing fact table should also measure product by
individual unit, as well as by cases shipped, which Figure
1 shows. It’s OK to store multiple measures of a product
in the base-level fact table. Note that you can always
preaggregate the fact tables by rolling the measures into
sales by day, sales by week, and cases shipped per year
per warehouse, which creates additional, summarized fact
tables in the process. The granularity of the base-level fact tables should be consistent across the entire value chain,
so that your report can flow freely and easily down the
line, producing relevant, accurate results. You can select
the granularity of the fact tables by matching the finest
granularity of the associated dimensions that are part
of the star schema. If the finest granularity of the Time
dimension is an individual day, then the granularity of the
associated base-level fact table(s) should also be the day.
Should every fact table be conformed? It depends. You
might have a fact table that isn’t part of a value chain;
in that case, there’s nothing to conform the fact table to
except its associated dimensions. Typically, the types of
fact tables that should be conformed are those that can
be used to derive revenue, profit, standard prices, and
standard costs.
When you’re conforming fact tables, a fact has to be the
same everywhere it appears. If you have a fact called net
quarterly revenue, the same equation and algorithm must be
used to produce the numbered results everywhere it’s used.
If you can’t conform net quarterly revenue across the board,
then you need to name the variations accordingly, such as
fiscal year net quarterly revenue, calendar year net quarterly
revenue, and academic year net quarterly revenue.
Why Use a Bus Architecture?
There are many reasons to use a bus architecture. Here a
just a few of those reasons.
Efficiency. A single copy of a dimension (e.g., time,
geography) involves less maintenance—for example, you
won’t have to synchronize changes to multiple time dimension
tables.
Consistency. By definition, a conformed dimension
means the same thing everywhere it’s used.
Ease of use. Once you get used to what the conformed
dimension looks like, how it’s structured, what it represents,
and its content, writing code against it is almost effortless.
Expandability. A conformed dimension decodes every
fact table in the enterprise data warehouse that it’s associated
with. When a new fact table is introduced to the data
warehouse, linking the fact table to the existing conformed
dimensions to form a new cube is straightforward and
expands and extends the reporting capabilities of the
enterprise data warehouse.
Standards enforcement. Once everyone has come to
consensus on the architecture,
meaning, and data content of a
conformed dimension, the rest
of the organization will have
to use the design, and through
frequent use these conformed
dimensions will become the de
facto standard.
Reporting enhancement.
When you write reports against
a data warehouse that’s built
on a bus architecture, you can
readily access manufacturing,
inventory, distribution, and
sales data and know that you’re
getting the same information
across the board.
Tightly Integrated
Data Marts
Conformed dimensions and
facts are the backbone architecture
of the enterprise data
warehouse. When you build
an enterprise data warehouse
of separately implemented but
tightly integrated data marts,
you can refactor each data mart
as necessary. You can also add
a new data mart to the “bus”
when it’s needed, knowing that
it will work with all the preexisting
data marts.