Understandability and Performance
The dimensional model, which becomes a star schema when physically implemented, is a lean, mean performance machine. When faced with a star schema, the database optimizer can depend on heuristics (i.e., rules) rather than on more time-consuming cost-based algorithms to resolve a query. First, the optimizer can constrain the dimension tables referenced in the query. Then, using the filtered key values from the dimension tables, the optimizer can resolve the many-way join to the fact table in a single sort-merge pass.

Note that as of right now, I haven't been able to confirm that the SQL Server 2005 Analysis Services (SSAS) optimizer uses heuristics rather than a cost-based analysis. However, I do know that SSAS uses thin-client architecture to minimize the load on the client computer, ensuring that SSAS will have scalable performance. The SSAS calculation engine is entirely server-based, so all queries are resolved on the server, optimizing the use of the corporate network. Each query, regardless of its complexity, requires just one round trip between the client and the server. For more information about dimensional modeling and SSAS, see "Dimensional Modeling Basics," April 2006, InstantDoc ID 49365.

The dimensional model is extensible and expandable; it can accommodate changes in user behavior, new data elements, and new ways of analyzing the data. You can easily change a fact table in place by executing the ALTER TABLE command—the table doesn't have to be dropped and recreated and the data doesn't have to be reloaded. You can add new (and unanticipated) columns to the fact table as long as they're consistent with the fact table's existing grain (as I explain later).

Going with the Grain
The grain of a fact table is the level of detail that the table captures. When designing a fact table, choose the grain very carefully—don't aggregate (i.e., summarize) prematurely. For example, a fact table whose records contain weekly sales summaries rather than individual sale amounts is a coarse-grained, aggregated fact table that doesn't support reports on individual sales.

If you aggregate prematurely, you won't be able to gracefully accommodate new sources of data. Design your table to have the finest possible granularity of data: the finer the granularity, the more robust the design. As you can see, the granularity of Reseller_Sales (the fact table in Figure 1) is at the order line item (SalesOrderLineNbr). That's the lowest and best grain for the data in a sales fact table. From this point, you can easily aggregate the fine-grained data into coarser measures, such as product or weekly sales, as required by the demands of the report-writing end users. Once you've captured the data at its finest grain, you can easily add new attributes to both the fact and the dimension tables.

Finely grained data consists of unsummarized detail records from an operational data source. Individual line items in a sales transaction, individual deposits and withdrawals in a banking transaction, individual line items on a shipping invoice, and individual attendance at an event can all be considered fine-grained data.

For performance reasons, you might find it useful to aggregate fact table records, and that's fine. Just remember that summarized or aggregated versions of the individual fact records are separate records that need to be stored in different fact tables that have a coarser grain. It's important not to mix detail and summary records in the same table: Store them separately. You'll be rewarded when you start using BI query and reporting tools that combine the facts along various dimensions.

Data Marts and Fact Tables
A single fact table does not a data mart make—at least, not usually. Most data marts consist of multiple coordinated fact tables that have similar structures because they're all derived from one business supply chain or value chain. An example of a manufacturing supply chain would be ingredient purchasing, ingredient inventory, bill of materials, manufacturing process control and costs, packaging, and finished goods inventory. Many companies have software systems that manage the flow of control through the supply chain. Usually, there's a data source for each step of the supply chain. Each data source can translate into a set of data marts and has at least one fact table. After the manufacturing step, the flow of control is called a value chain. An example of a retail value chain is ship from manufacturing, reseller inventory, reseller shipments, retail inventory, and retail sales.

The purpose of the data warehouse is to provide BI end users with a single source of information regarding supply and value chains so they can follow the data as it flows through the business cycle from beginning to end.

When doing dimensional designs, it helps to understand what kinds of information you can request from the data. For example, from fine-grained fact tables such as Reseller_Sales, you can analyze behavior and frequencies and perform behavior counts, such as the number of times a customer places an order on the date of a storewide inventory sale. You can do time-of-day analysis (e.g., determine whether online sales increase during the lunch hour) and queue analysis (e.g., determine highway capacity and traffic flow, packet traffic on IP networks, and customer movement within an online store). You might be able to do some sequential behavior analysis (e.g., does action A always result in action B?), which could in turn warn you of possible fraud or customer cancellation intent (e.g., this reseller is planning to drop out of your company's partner program). Finally, every retail data mart should lend itself to basket analysis (e.g., do people really buy more beer if it's placed near the baby diapers?) and missing-basket analysis (e.g., what didn't work the way we thought it would and why?). These are all valuable measures that can be derived only with the assistance of a fine-grained fact table containing records at the individual transaction level.

Building a User-Friendly Data Structure
A dimensional model lets you build a user-friendly data structure that makes data access intuitive. The fact table is the core of the dimensional model and contains keys from the dimension tables and the raw numbers that BI users will turn into information. Constructing the fact table so that it contains the finest grain of transactional data available will let you expand and extend your data warehouse as necessary. The dimensional model conforms easily as you coalesce multiple data marts into an enterprise data warehouse and supports a very wide variety of analytical questions that will help you determine your next move in the business world.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

great clarity

jjderoo

Article Rating 4 out of 5

Very elucidative article, specially for a beginner like me. I will surely read the rest of the articles.

Keep Writing Michelle. Marcos Galvani

MarcosGalvani

Article Rating 5 out of 5