[Author's Note: If you've been reading SQL Server Magazine since the beginning,
you know that my column has focused mainly on design fundamentals for transactional databases. This month, I switch gears to cover design fundamentals for data
warehouses and data warehousing. In subsequent columns, I'll cover topics such as
dimensional table physical design, fact table physical design, fact table partitioning
design, summary table design, and slowly changing dimension design. In addition,
I'll discuss indexing techniques that are optimized for use with data warehouses and
partitioned tables, and I'll look at how to gather requirements for data warehouse
design, which can differ from the requirements-gathering techniques used for transactional database design.]
The data warehouse is one of the foundational structures of a business intelligence
(BI) solution. Like transactional databases, data warehouses require a schema
design. The most basic schema design for a data warehouse is a star schema.
If you want to create multidimensional cubes for BI analysis, using a star schema
for the data warehouse is a good solution. In this article, I explain why the
star schema is preferable over other schema designs, and I use an example star
schema to illustrate this design's benefits. (To learn more about BI and data
warehouses, see the sidebar "Data Warehousing: The Foundation
of BI". For more information about data warehouses and star schemas, see
the Learning Path.)
Several reasons exist for using a star schema rather than a conventional normalized
design. First, you must use a star schema if you want to build and use OLAP
cubes. The cube dimensions are the axes of analysis—the "by" items (e.g.,
by time period, by product line, by region). The fact table defines the cube
and its purpose; you analyze the facts by or through the different dimensions.
And, perhaps most importantly, the star schema provides fast response time when
implemented as an OLAP cube.
Another reason for using a star schema for data warehousing is that the star
schema parallels the way that people tend to think about and use data. No one
except data modelers, DBAs, and some database programmers think of data the
way it's structured in a transactional database. The star schema, when implemented
as an OLAP cube, lets both developers and end users more easily understand and
navigate the metadata. In addition, you can modify and build upon a star schema
as your organization's BI needs expand. Unlike with conventional transactional
database schemas, you don't have to worry about storing a non-key attribute
only one time. And last but not least, the star schema broadens your choice
of front-end BI tools because some tools work only against OLAP cubes.
Figure 1 shows an example of a star schema
that's modeled after the AdventureWorksDW sample data warehouse that ships with
SQL Server 2005. The schema features only one fact table, Reseller_Sales. A
fact table is a collection of keys and measures. The keys relate each row in
the fact table to an associated row in a dimension table. As in a transactional
database schema design, the primary key of the Product dimension becomes a foreign
key in the Reseller_ Sales fact table. The measures (effectively, anything that's
not a key column) are the operational data that the statisticians have been
waiting for and are all nicely packaged and ready for analysis.
The example star schema in Figure 1 is meant
to support decision-making and BI software tools. If you implemented the schema,
it would be populated from comparable tables and columns in the transactional
version of the AdventureWorks database.
The schema dimensions (i.e., Time,
Product, Reseller, and Sales_Territory)
can be mapped to tables or views in
the AdventureWorks database, which
facilitates transferring the operational
data into the data warehouse and ultimately into the Reseller_Sales cube.
Each dimension is an axis of analysis in
the cube, so you could analyze the data
in the cube by month, by region, or by
business type.
Notice the simple dimensional hierarchy in this schema—from Product_Category
to Product_Subcategory to Product. This structure reduces redundancy and makes
the star schema a snowflake schema—albeit a rather lopsided snowflake
in this case. You can add as many dimensions as necessary to the star/snowflake
schema. You can also implement more complicated structures, such as a geography
dimension that is parent to both Reseller and Sales_Territory—but I'll
save that discussion for when I cover dimensional table design.
Prev. page  
[1]
2
next page