SideBar    Data Warehousing: The Foundation of BI

[Author's Note: If you've been reading SQL Server Magazine since the beginning, you know that my column has focused mainly on design fundamentals for transactional databases. This month, I switch gears to cover design fundamentals for data warehouses and data warehousing. In subsequent columns, I'll cover topics such as dimensional table physical design, fact table physical design, fact table partitioning design, summary table design, and slowly changing dimension design. In addition, I'll discuss indexing techniques that are optimized for use with data warehouses and partitioned tables, and I'll look at how to gather requirements for data warehouse design, which can differ from the requirements-gathering techniques used for transactional database design.]

The data warehouse is one of the foundational structures of a business intelligence (BI) solution. Like transactional databases, data warehouses require a schema design. The most basic schema design for a data warehouse is a star schema. If you want to create multidimensional cubes for BI analysis, using a star schema for the data warehouse is a good solution. In this article, I explain why the star schema is preferable over other schema designs, and I use an example star schema to illustrate this design's benefits. (To learn more about BI and data warehouses, see the sidebar "Data Warehousing: The Foundation of BI". For more information about data warehouses and star schemas, see the Learning Path.)

Several reasons exist for using a star schema rather than a conventional normalized design. First, you must use a star schema if you want to build and use OLAP cubes. The cube dimensions are the axes of analysis—the "by" items (e.g., by time period, by product line, by region). The fact table defines the cube and its purpose; you analyze the facts by or through the different dimensions. And, perhaps most importantly, the star schema provides fast response time when implemented as an OLAP cube.

Another reason for using a star schema for data warehousing is that the star schema parallels the way that people tend to think about and use data. No one except data modelers, DBAs, and some database programmers think of data the way it's structured in a transactional database. The star schema, when implemented as an OLAP cube, lets both developers and end users more easily understand and navigate the metadata. In addition, you can modify and build upon a star schema as your organization's BI needs expand. Unlike with conventional transactional database schemas, you don't have to worry about storing a non-key attribute only one time. And last but not least, the star schema broadens your choice of front-end BI tools because some tools work only against OLAP cubes.

Figure 1 shows an example of a star schema that's modeled after the AdventureWorksDW sample data warehouse that ships with SQL Server 2005. The schema features only one fact table, Reseller_Sales. A fact table is a collection of keys and measures. The keys relate each row in the fact table to an associated row in a dimension table. As in a transactional database schema design, the primary key of the Product dimension becomes a foreign key in the Reseller_ Sales fact table. The measures (effectively, anything that's not a key column) are the operational data that the statisticians have been waiting for and are all nicely packaged and ready for analysis.

The example star schema in Figure 1 is meant to support decision-making and BI software tools. If you implemented the schema, it would be populated from comparable tables and columns in the transactional version of the AdventureWorks database.

The schema dimensions (i.e., Time, Product, Reseller, and Sales_Territory) can be mapped to tables or views in the AdventureWorks database, which facilitates transferring the operational data into the data warehouse and ultimately into the Reseller_Sales cube. Each dimension is an axis of analysis in the cube, so you could analyze the data in the cube by month, by region, or by business type.

Notice the simple dimensional hierarchy in this schema—from Product_Category to Product_Subcategory to Product. This structure reduces redundancy and makes the star schema a snowflake schema—albeit a rather lopsided snowflake in this case. You can add as many dimensions as necessary to the star/snowflake schema. You can also implement more complicated structures, such as a geography dimension that is parent to both Reseller and Sales_Territory—but I'll save that discussion for when I cover dimensional table design.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

a concise and excellent read!

pconnellan@nyc.rr.com

Article Rating 5 out of 5

Dear Reader, Thanks for the feedback. Hope you'll find the whole series useful! Please let us know. Your feedback helps us direct our content to your needs. Diana - Technical Editor, SQL Server Magazine

DianaMay

Article Rating 5 out of 5

All these advertisements are really really annoying. I feel like navigating through a minefield to finish reading an article. Its not like this is a free website or something in order for me to put up with this annoyance. Could you please address this issue.

Quite a few of your articles are thorough, professional and informative. I really enjoy reading your magazine articles.

Thanks, Bala

bala_tek@yahoo.com

Article Rating 4 out of 5

Dear Bala, Thanks for taking the time to give your feedback. I'll be passing this on to our sales team... Diana May Sr Technical Editor SQL Server Magazine

DianaMay

Article Rating 4 out of 5