Step 4: Construct a Conceptual Data Model
After identifying the business processes, you can create a conceptual model of the data. You determine the subjects that will be expressed as fact tables and the dimensions that will relate to the facts. Clearly identify the key performance indicators for each business process, and decide the format to store the facts in. Because the facts will ultimately be aggregated together to form OLAP cubes, the data needs to be in a consistent unit of measure. The process might seem simple, but it isn't. For example, if the organization is international and stores monetary sums, you need to choose a currency. Then you need to determine when you'll convert other currencies to the chosen currency and what rate of exchange you'll use. You might even need to track currency-exchange rates as a separate factor.

Now you need to relate the dimensions to the key performance indicators. Each row in the fact table is generated by the interaction of specific entities. To add a fact, you need to populate all the dimensions and correlate their activities. Many data systems, particularly older legacy data systems, have incomplete data. You need to correct this deficiency before you can use the facts in the warehouse. After making the corrections, you can construct the dimension and fact tables. The fact table's primary key is a composite key made from a foreign key of each of the dimension tables.

Data warehouse structures are difficult to populate and maintain, and they take a long time to construct. Careful planning in the beginning can save you hours or days of restructuring.

Step 5: Locate Data Sources and Plan Data Transformations
Now that you know what you need, you have to get it. You need to identify where the critical information is and how to move it into the data warehouse structure. For example, most of our example company's data comes from three sources. The company has a custom in-house application for tracking training sales. A CRM package tracks the sales-force activities, and a custom time-reporting system keeps track of time.

You need to move the data into a consolidated, consistent data structure. A difficult task is correlating information between the in-house CRM and time-reporting databases. The systems don't share information such as employee numbers, customer numbers, or project numbers. In this phase of the design, you need to plan how to reconcile data in the separate databases so that information can be correlated as it is copied into the data warehouse tables.

You'll also need to scrub the data. In online transaction processing (OLTP) systems, data-entry personnel often leave fields blank. The information missing from these fields, however, is often crucial for providing an accurate data analysis. Make sure the source data is complete before you use it. You can sometimes complete the information programmatically at the source. You can extract ZIP codes from city and state data, or get special pricing considerations from another data source. Sometimes, though, completion requires pulling files and entering missing data by hand. The cost of fixing bad data can make the system cost-prohibitive, so you need to determine the most cost-effective means of correcting the data and then forecast those costs as part of the system cost. Make corrections to the data at the source so that reports generated from the data warehouse agree with any corresponding reports generated at the source.

You'll need to transform the data as you move it from one data structure to another. Some transformations are simple mappings to database columns with different names. Some might involve converting the data storage type. Some transformations are unit-of-measure conversions (pounds to kilograms, centimeters to inches), and some are summarizations of data (e.g., how many total seats sold in a class per company, rather than each student's name). And some transformations require complex programs that apply sophisticated algorithms to determine the values. So you need to select the right tools (e.g., Data Transformation Services—DTS—running ActiveX scripts, or third-party tools) to perform these transformations. Base your decision mainly on cost, including the cost of training or hiring people to use the tools, and the cost of maintaining the tools.

You also need to plan when data movement will occur. While the system is accessing the data sources, the performance of those databases will decline precipitously. Schedule the data extraction to minimize its impact on system users (e.g., over a weekend).

Step 6: Set Tracking Duration
Data warehouse structures consume a large amount of storage space, so you need to determine how to archive the data as time goes on. But because data warehouses track performance over time, the data should be available virtually forever. So, how do you reconcile these goals?

The data warehouse is set to retain data at various levels of detail, or granularity. This granularity must be consistent throughout one data structure, but different data structures with different grains can be related through shared dimensions. As data ages, you can summarize and store it with less detail in another structure. You could store the data at the day grain for the first 2 years, then move it to another structure. The second structure might use a week grain to save space. Data might stay there for another 3 to 5 years, then move to a third structure where the grain is monthly. By planning these stages in advance, you can design analysis tools to work with the changing grains based on the age of the data. Then if older historical data is imported, it can be transformed directly into the proper format.

Step 7: Implement the Plan
After you've developed the plan, it provides a viable basis for estimating work and scheduling the project. The scope of data warehouse projects is large, so phased delivery schedules are important for keeping the project on track. We've found that an effective strategy is to plan the entire warehouse, then implement a part as a data mart to demonstrate what the system is capable of doing. As you complete the parts, they fit together like pieces of a jigsaw puzzle. Each new set of data structures adds to the capabilities of the previous structures, bringing value to the system.

Data warehouse systems provide decision-makers consolidated, consistent historical data about their organization's activities. With careful planning, the system can provide vital information on how factors interrelate to help or harm the organization. A solid plan can contain costs and make this powerful tool a reality.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

There should have been mention of metadata and metadata repository, and mapping of source to target and its significance in a datawarehouse/datamart in this article. Also multi-dimensional cubes, relational, star schema, snowflakes, and the hybrid DWs should have been discussed.

J Zaman

quite informative but i need the to get the basics on DWH

sam king'oo

 
 

ADS BY GOOGLE