The star schema design that Figure
1 shows has several notable characteristics:
Every table in the star schema has an identity primary key, which prevents
quibbling about natural primary keys versus surrogate primary keys. In a data
warehouse, the primary key assignment goes to the surrogate; if you have a
natural key that you want to retain for querying, you designate it as an alternate
key.
Most of the columns in both the dimension tables and the fact table are
nullable. Only the primary and alternate keys are mandatory, and only the
primary key is unique. You need to assume that data will be loaded into this
data warehouse structure from various sources, even from multiple generations
of operational databases. Therefore, the constraints that you would normally
apply to enforce business rules in the transactional database must be relaxed
in the data warehouse. Most of the columns must be nullable because you might
not have data for them.
Relationships are optional on the parent side (i.e., zero-to-one to zero-to-many).
If you have no control over the source data's referential integrity, you must
allow for orphan records in the fact table and in lower levels of a dimensional
hierarchy.
Most columns in the fact table (i.e., the measures) are numbers. The fact
table is the focus of a BI investigation, and BI analysts are looking for
numbers and facts.
Data redundancy is rampant throughout the star schema design; this redundancy
is necessary for the data warehouse to reach an acceptable level of performance.
The amount of data in a data warehouse is typically enormous compared with
the amount of data in a transactional database. When you write a T-SQL query
against the data warehouse's star schema, the redundancy minimizes the number
of joins required to return the data, yielding much better performance than
if you issued the same query against the source transactional database.
Schema design for a data warehouse need not be much different than schema design
for a conventional transactional database. Because the data warehouse is an
historical archive, you can retain some semblance of normalcy in the data warehouse
schema design. In addition, you can create summary tables or columns, and you'll
certainly want to add timestamps and identity values to individual records in
each of a relational data warehouse's tables. A benefit of retaining a relational
or near-relational data warehouse is that the warehouse is a readily available
reporting database that you can use conventional T-SQL to query. With no programming
learning curve and no special tool requirements, the data warehouse is an immediate
ROI.
LEARNING PATH SQL SERVER
MAGAZINE RESOURCES: "OLAP: Resistance Is Futile!"
InstantDoc ID 5103 "Solutions by Design: Adding a Questionnaire
to a Design," InstantDoc ID 7449 "Step-by-Step Data Warehousing,"
InstantDoc ID 8047
If you don't have a username & password, please
register now.
Reader Comments
a concise and excellent read!
pconnellan@nyc.rr.com- July 03, 2007
Article Rating 5 out of 5
Dear Reader, Thanks for the feedback. Hope you'll find the whole series useful! Please let us know. Your feedback helps us direct our content to your needs. Diana - Technical Editor, SQL Server Magazine
DianaMay- July 05, 2007
Article Rating 5 out of 5
All these advertisements are really really annoying. I feel like navigating through a minefield to finish reading an article. Its not like this is a free website or something in order for me to put up with this annoyance. Could you please address this issue.
Quite a few of your articles are thorough, professional and informative. I really enjoy reading your magazine articles.
Thanks, Bala
bala_tek@yahoo.com- December 17, 2007
Article Rating 4 out of 5
Dear Bala, Thanks for taking the time to give your feedback. I'll be passing this on to our sales team... Diana May Sr Technical Editor SQL Server Magazine
DianaMay- December 21, 2007
Article Rating 4 out of 5
I agree with Bala. The articles are great but the website could use some user-friend techniques. Keep writing Michelle.