DOWNLOAD THE CODE:
Download the Code 25544.zip

Demystify slowly changing dimensions with the Data Driven Query task in SQL Server 2000

About 3 years ago, I participated in a data warehouse course for trainers. Stewart McCloud, one of Microsoft's Data Transformation Services (DTS) architects, gave a session about the Data Driven Query (DDQ) task in SQL Server 7.0. I was fascinated with DTS in general and the powerful DDQ task specifically. SQL Server 2000 enhanced DTS's capabilities with a wealth of new tasks, but from my experience, the DDQ task remains one of the toughest tasks for students to learn.

One of the main uses of the DDQ task is implementing a slowly changing dimensions solution in a data warehouse environment. The DDQ task can examine data retrieved from a source system and decide, at runtime, which action to take at the destination system based on the input. The phrase slowly changing dimensions refers to various changes that attributes of dimensions undergo infrequently compared to changes that the fact table's measures undergo. Examples of dimension attributes that might change slowly include a company name, an employee's supervisor, or the volume of a product's package. For more information about the theoretical aspects of slowly changing dimensions, see two books that Ralph Kimball coauthored: The Data Warehouse Toolkit (Wiley, 2002) and The Data Warehouse Lifecycle Toolkit (Wiley, 1998).

In this article, I won't cover the theoretical aspects of slowly changing dimensions in depth. Instead, I look at the technical and practical implementations of slowly changing dimensions and discuss the DDQ task and some other new DTS features in SQL Server 2000. I'm assuming that you're familiar with DTS package creation, the Transform Data task, DTS flow elements, basic ActiveX transformations, the ADO Recordset object, Visual Basic (VB) scripts, and writing T-SQL queries and stored procedures.

To handle slowly changing dimensions, you can use any of the three classic techniques, imaginatively named Type 1, Type 2, and Type 3. Handling changes to different attributes or changes to the same attribute in different scenarios might demand different techniques. As an example, let's examine how you can use different techniques to deal with a change in an attribute: a product's package volume.

Suppose a soft-drink company wants to promote a certain product. The company increases the drink's package volume by 10 percent, thus providing more liquid for the same price. Let's look at each of the three slowly changing dimensions techniques and their implications in this situation.

Type 1. Type 1 handling overwrites an old attribute with a new one. This type is usually used for correcting historical errors or when you don't care about losing historical data. If you overwrite the product's old package volume with the new one, you won't be able to correctly calculate volumes of historical product sales. This type of change is technically the simplest to implement.

Type 2. With Type 2 handling, you create a new dimension row with a new dimension key but also keep the old dimension row, with the old dimension key. Note that you'd probably want to keep two keys in your data warehouse: the application key and the dimension key. The application key is the online transactional processing (OLTP) environment's production key, which is unchanged; the dimension key is a surrogate key that correlates between rows in the fact table and rows in the dimension table. The dimension key gets a new value.

You also need to keep track of the effective date of each dimension key. You might store a range of effective dates as two values—from_date and to_date—for simpler data manipulation. Type 2 handling lets you correctly calculate volumes of liquid in historical sales, for example, but might require you to maintain several keys per product. Let's call the application key in our example productid_app and the data warehouse­specific dimension key productid_key.

   Prev. page   [1] 2 3 4 5     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Nice article, but... In real life, the OLTP system is a 'given', which means that your are never allowed to create triggers and extra logtables on the production server. In real life, the OLTP system is often running on a different platform than Win/SQL. In real life, dimension tables tend to be quite large (+1 Mln records for a customer dimension is not exceptional). So in this situation, you cannot determine whether a source record is an insert, delete or update from a log table but first you have to match it against the target table. It would be interesting to see how you think DTS can help here. For instance, recently I had to develop a solution using DTS to keep Navision C/Side transaction tables (GL postings) in sync with a 'replica' on SQL Server. The only (?) way to accomplish this was using an ActiveX Script task.

Jos van Dongen

Excelent article. One of the best that I have ever read in SQL Server Magazine.

Anonymous User

Article Rating 5 out of 5