Demystify slowly changing dimensions with the Data Driven Query task in SQL Server 2000
About 3 years ago, I participated in a data warehouse course for trainers. Stewart McCloud, one of Microsoft's Data Transformation Services (DTS) architects, gave a session about the Data Driven Query (DDQ) task in SQL Server 7.0. I was fascinated with DTS in general and the powerful DDQ task specifically. SQL Server 2000 enhanced DTS's capabilities with a wealth of new tasks, but from my experience, the DDQ task remains one of the toughest tasks for students to learn.
One of the main uses of the DDQ task is implementing a slowly changing dimensions solution in a data warehouse environment. The DDQ task can examine data retrieved from a source system and decide, at runtime, which action to take at the destination system based on the input. The phrase slowly changing dimensions refers to various changes that attributes of dimensions undergo infrequently compared to changes that the fact table's measures undergo. Examples of dimension attributes that might change slowly include a company name, an employee's supervisor, or the volume of a product's package. For more information about the theoretical aspects of slowly changing dimensions, see two books that Ralph Kimball coauthored: The Data Warehouse Toolkit (Wiley, 2002) and The Data Warehouse Lifecycle Toolkit (Wiley, 1998).
In this article, I won't cover the theoretical aspects of slowly changing dimensions in depth. Instead, I look at the technical and practical implementations of slowly changing dimensions and discuss the DDQ task and some other new DTS features in SQL Server 2000. I'm assuming that you're familiar with DTS package creation, the Transform Data task, DTS flow elements, basic ActiveX transformations, the ADO Recordset object, Visual Basic (VB) scripts, and writing T-SQL queries and stored procedures.
To handle slowly changing dimensions, you can use any of the three classic techniques, imaginatively named Type 1, Type 2, and Type 3. Handling changes to different attributes or changes to the same attribute in different scenarios might demand different techniques. As an example, let's examine how you can use different techniques to deal with a change in an attribute: a product's package volume.
Suppose a soft-drink company wants to promote a certain product. The company increases the drink's package volume by 10 percent, thus providing more liquid for the same price. Let's look at each of the three slowly changing dimensions techniques and their implications in this situation.
Type 1. Type 1 handling overwrites an old attribute with a new one. This type is usually used for correcting historical errors or when you don't care about losing historical data. If you overwrite the product's old package volume with the new one, you won't be able to correctly calculate volumes of historical product sales. This type of change is technically the simplest to implement.
Type 2. With Type 2 handling, you create a new dimension row with a new dimension key but also keep the old dimension row, with the old dimension key. Note that you'd probably want to keep two keys in your data warehouse: the application key and the dimension key. The application key is the online transactional processing (OLTP) environment's production key, which is unchanged; the dimension key is a surrogate key that correlates between rows in the fact table and rows in the dimension table. The dimension key gets a new value.
You also need to keep track of the effective date of each dimension key. You might store a range of effective dates as two values—from_date and to_date—for simpler data manipulation. Type 2 handling lets you correctly calculate volumes of liquid in historical sales, for example, but might require you to maintain several keys per product. Let's call the application key in our example productid_app and the data warehousespecific dimension key productid_key.
Prev. page  
[1]
2
3
4
5
next page