Executive Summary:
When many same-type events are in a single fact table, you have an accumulating fact table. Dimensional role-playing is when a single dimensions needs to be part of a fact table multiple times. The views (dimensional roles) serve as virtual dimensions.
|
Role-playing isn’t limited to the Broadway stage;
data warehouse dimensions can also assume
roles. Dimensional roles can be used not only
for precision when you’re defining the purpose or function
of a set of rows from a dimension, but also to easily
maintain the data because there’s only one table to manage
instead of many tables to synchronize.
Dimensional role-playing occurs when a single dimension
needs to be part of the same fact table many times.
For example, dates and times appear in most types of
analysis because business activities happen in a timeframe
and objects exist in time. Time is almost always used when
calculating and evaluating measures in a fact table. Each
event constitutes a link to the time dimension. For example,
a customer order was taken on a specific date; the order was
picked and packed at the warehouse on one or more dates;
the order was shipped to the customer on a date; the order
has a delivery date; the order’s payment was received on a
date; and sometimes, the order was returned on a date and
a refund was issued on a date. All of these dates need to be
recorded, stored, and analyzed because they all play a role
in the retail sale life cycle.
Defining Roles
When many same-type events, such as dates, are present in
a single fact table, you have an accumulating snapshot fact
table. (Of course, there are other types of fact tables, which
I’ll discuss in upcoming articles.) Figure 1 shows
the Shipping star schema. The fact table (i.e., factShipping)
is an accumulating snapshot; it contains dates for
multiple events, such as scheduled and actual departure
and arrival dates and times, where each shipment starts
and ends, and the buyer and the seller. We could record
a lot more discreet date and time events in this fact table,
but no others are necessary for this example.
This star schema isn’t a cube; it’s a set of relational tables
that can be managed and manipulated using SQL Server
Enterprise Manager or SQL Server Management Studio
(SSMS). Clearly, it isn’t the OLTP tables that are used to initially
capture the retail events because OLTP tables are more
highly normalized than the tables in Figure 1. Also, the
join or associative table, which is the OLTP counterpart to
the OLAP fact table, would typically have fewer attributes
than this fact table. These tables are containers that hold
the source data for the OLAP Services/Analysis Services
multidimensional databases (cubes). Another name for a
multidimensional database is the staging database or data
store. The staging database receives verified and validated
data via the extraction, transformation, and loading (ETL)
processes from various external data sources, including
OLTP databases. (This staging database is also an excellent
reporting database because the veracity of its data content
and the logical orderliness of its architecture make sense
to business users. Business users armed with Microsoft
Excel can build ad hoc reports without having to rely on
overworked business intelligence (BI) programmers and
analysts.)
Take a look at the views in Figure 1, which you
can identify because view names start with “v” (e.g.,
vSeller). (Note that in Figure 1, dimension table names
start with “dim” and fact table names start with “fact.”)
I’ve built views on each of the three dimensions, Time,
Person, and Location. In this star schema, I’m linking the
views rather than using the
dimensions to link to the
fact table. Now it’s easy to
visualize that each shipment
has a buyer and a seller, a
starting and an ending location,
and multiple date and
time events associated with
each record in the fact table.
The views (aka roles) serve
as virtual dimensions.
When creating a dimensional
role, you can filter
data. For example, if the
company won’t ship orders on weekends or holidays, it’s easy to constrain the shipping
date views with the following selection criteria:
select …
from dimTIME
where DayNumberOfWeek between 2 and 6
and Holiday = 0
Or you can separate buyers from all other people when creating
the vBuyer view by running the following command:
create view vBuyer
AS
select …
from dimPerson
where PersonType = ‘buyer’
Supertypes and Subtypes
If something seems familiar about this discussion of
dimensional roles, the reason might be that I’ve discussed
the concepts of supertypes and subtypes in previous columns.
Dimensional role-playing combines data subtyping
and filtering into one operation. For example, in Figure
1, dimPerson is a supertype and vBuyer and vSeller are
virtual subtype entities. Using the WHERE criterion, the
WHERE clause of the query that creates the subsets of
data for each of the views acts as a data filter and establishes
the subtype, which then ensures that only buyers are
listed in the vBuyer dimensional view and only sellers are
listed in the vSeller dimensional view.
Each of the roles that these dimensions play in the
accumulating snapshot fact table is a little different,
and it’s important to relabel the view dimension and its
attributes to better express the role meaning. I’ve done
that in Figure 1: For example, the time dimension has
morphed into vScheduledShipDate, and TimeKEY has
become SchedShipDateKEY. There’s little chance that a
business user would use vScheduledShipDate under the
assumption that it represented arrival times. Remember,
one characteristic of a data warehouse is ease of understanding,
and simple naming conventions make the roles
easy to understand.
The second characteristic of a data warehouse is
performance. As anyone who works in data warehousing
will tell you, proper indexing is crucial to good performance,
so you should consider indexing views. Because
of the simplicity of the dimensional table structure—a
surrogate key, a flattened set of attributes—the dimensions
lend themselves nicely to serving as base tables for
indexed views. Also, the indexed view is dynamically
updated as the base table changes, so you only have to
manage the dimension table when you’re executing ETL
operations.
Dimensional
Role-Playing
If you’re still having trouble understanding
how dimensional roleplaying
works, visualize the last
time you flew in an airplane. There
was a scheduled departure date/
time, an actual departure date/time,
a scheduled arrival date/time, and
an actual arrival date/time. Then
there was the time on the ground
at the point of departure (calculated
as the time interval between
when the jet pulled away from
the gate until the wheels were off
the ground), the time spent in the
air traveling from the departure
airport to the arrival airport, and
the time on the ground at the
point of arrival (calculated as the
time interval between touch down
and when the jet docked at the
gate). These times are critical for
conventional airline flight analysis,
and each would have an entry in
a fact table that would be used
to calculate airline flight on-time
percentages.
End of Article