In "SSIS Novices' Guide to Data Warehouses: Moving Data Into the Data Warehouse", I showed you the basic structure of a data
warehouse whose databases contain sets of tables that store raw, staged, and dimensionally modeled data. These tables are referred to as the Raw
tables, Stage tables, and Dimensional tables, respectively. I also showed you how to create a SQL Server Integration Services (SSIS) package called the
Raw package. This package is used to move a near exact copy of your source data from an external location (probably the transactional database and
server) to the Raw tables.
The next step in building a data warehouse is moving the data from the Raw tables to the Stage tables, which is referred to as staging the data. During
the staging step, many changes are made to both the data's structure and content. The changes are driven by business rules and dimensional
architectural needs.
When explaining the kind of changes that take place in the staging step, I like to refer to something called the Five F Words. Although some aspects of
data warehousing can be frustrating, none of these F words will get you in trouble at work. In fact, only one has four letters, and that word is Flag.
As Figure 1 shows, the Five F Words are Flatten, Fix, Flag, Filter, and Figure. In this article, I'll cover Flatten. I like to start with this F Word
because flattening data is often the hardest concept for new warehouse designers to grasp, especially if they come from a transactional database design
background.

Figure 1: The Five F Words
Flattening 101
Designing a transactional database is all about getting it into third normal form (3NF). Normalization becomes second nature to database developers,
which is why data warehouse design often feels very foreign to database developers who are building their first data warehouse. Many tables within
warehouses are denormalized or flattened when compared to their transactional system counterparts. Transactional systems need to load new data quickly
and can return report data slowly, as reporting is a secondary, less crucial activity. Warehouses can load data slowly (often in the wee hours of the
morning), but should return report and ad-hoc query data quickly, as reporting is the primary function of a data warehouse. Denormalization or
flattening of data enables fast reporting.
To denormalize a database, you combine one or more tables into a single table. In a standard, no-frills dimensional database design, you flatten the
various tables of the transactional system so that relationships between the tables form a star pattern. This is referred to as a star schema. (Other
schemas exist, but I recommend that new designers always start with the star schema.)
As a general rule, transactions and events from the transactional database become facts, whereas lookup tables and profile data become dimensions. The fact table becomes the center of the dimensional model; each dimension hangs off the fact table, just one key relationship
away from the center of the model.
Knowing which tables need to be flattened is the essence of dimensional database design. New data warehouse developers shouldn't expect to get it right
the first time, every time. But take heart in the idea that it will get easier the more you do it.
Flattening StateUBags' Transactional System
In "SSIS Novices' Guide to Data Warehouses: Moving Data Into the Data Warehouse," I presented a sample scenario in which the StateUBags website sells
backpacks and messenger bags to college students. As Figure 2 shows, StateUBags' transactional database system uses an Order table to track sales. The
Order table rows have child records in the Order Detail table.

Figure 2: Transactional database model
To move from a fully normalized model to a denormalized dimensional model, you typically need to combine, add, and remove tables. For this example, you
need to combine the Brand, Line, Product, and Manufacturer tables into a single denormalized table, as Figure 3 shows. You then link this new dimension
to the Fact Orders table through the Product ID. Similarly, you need to combine the Color and Color Group tables into a single denormalized table,
which links to the Fact Orders table through the Color ID.

Figure 3: Dimensional database model
To demonstrate how this will create a denormalized structure, suppose that product "A" is always the color "Daffodil" and the Color Group for
"Daffodil" products is always "Happy Colors." In a normalized structure, product "A" would be defined in the product table with a list of attributes.
One of these attributes would be a link to the "Daffodil" value in the Color table. A link in the Color table would then note that "Daffodil" is in the
Color Group named "Happy Colors."
If you have 1,000 transactions in which product "A" was sold, all 1,000 rows would point back to a single row in the Color table, which would in turn
point to a single row in the Color Group table. The word "Daffodil" would be stored in the database just once in that single row in the Color table and
the label "Happy Colors" would also appear just once in a single row in the Color Group table. When reports were run from the normalized database for
product "A", the word "Daffodil" and the label "Happy Colors" would be assigned back to product "A" through a lookup process.