DOWNLOAD THE CODE:
Download the Code 25544.zip

Type 3. Type 3 handling keeps track of attribute changes in the same dimension row. The key remains the same as in Type 1, yet you keep track of history as in Type 2—almost. The number of changes you can track is finite. You predetermine the number of changes or versions that you want to track—let's say three—and duplicate the column that many times. You also store an effective date for each attribute. Each time a product attribute changes, you "push" to the right the existing versions of the attribute and effective dates in first in/first out (FIFO) order, storing the new values in the columns that hold the attribute's current version. This type is suitable when you want to keep track of historical changes but you care about accurately tracking only the recent changes. It has the benefit of keeping only one key per instance of an entity (as opposed to keeping several dimension keys as in Type 2) and also of maintaining a history to a certain extent. In the example, we'll track three versions of the product's package volume changes.

A Slowly Changing Dimensions Solution
Now let's look at the practical implementation of a slowly changing dimensions solution—that is, a way to keep track of information as it changes slowly in your data warehouse. Say your source OLTP system has a Products table, and you want to implement a Products dimension table in your destination data warehouse. You also want to implement a refresh process—the process that actually makes the changes to the dimension attributes—that uses whichever of the three slowly changing dimensions techniques that best suits your needs. For demonstration purposes, we'll name the destination dimension tables Products_type1, Products_type2, and Products_type3 and implement three processes that use the three slowly changing dimensions techniques. We'll implement the refresh process through a DTS package, using the DDQ task as the package's main tool. A database called Source represents the OLTP system, and a database called Destination is the destination data warehouse. You can create these databases in the same SQL Server installation, on two instances of SQL Server on the same machine, or on separate machines.

First, you need to prepare the source OLTP system's infrastructure. Connect to the source server and run the script in Listing 1 to create the Source database and the Products table. To track the changes, you can create a log table that records all transactions issued against the Products table, including all information that the refresh process requires. You can think of the log table as a kind of custom transaction log. Later, we'll use the log table as the source for the refresh process.

Next, making sure you're still connected to the source server, run Listing 2's script to create the Prod_log table. The lsn column maintains log serial numbers, which are automatically generated consecutive values that represent the chronological order of transactions. The lsn values will determine the order in which the refresh process processes the transactions. It's important to process the transactions in the destination system in the same order as they occurred in the source system. The log_date column stores the effective dates, and the tran_type column stores the transaction type: I for INSERT, D for DELETE, and U for UPDATE. The productid, productname, and package columns store the product's original attributes. Namechg and packagechg are bit columns that store 1 if the column they represent has changed and 0 if it hasn't.

Next, you create a set of triggers on the source Products table that record the transactions in the Prod_log table. The INSERT and DELETE triggers are fairly straightforward; they record all the data from the inserted and deleted tables, respectively, plus an I or a D as the tran_type in the Prod_log table. While connected to the source database, run the script that Listing 3 shows to create the INSERT and DELETE triggers. Note that SQL Server uses the IDENTITY column attribute and GETDATE() default value, respectively, to generate the lsn and log_date values. The namechg and packagechg values are irrelevant when you're inserting or deleting values, so they get NULLs.

The UPDATE trigger is more complex. First, it checks whether an attempt to change the product ID occurred, and if so, it rolls back the transaction. (A well-designed OLTP system shouldn't allow a change to the product ID once it's set.) The UPDATE trigger also needs to compare the old image of the data (the deleted table) with the new image (the inserted table) to make sure that the product's name or package changed and to determine the new values of the namechg and packagechg columns. Using the UPDATE() function to determine whether those columns changed isn't a good idea for two reasons. First, your trigger should handle multirow updates, but product attributes in some rows might change while other products' attributes remain the same. Second, the UPDATE() function returns TRUE when you specify a column in an UPDATE statement's SET clause, regardless of whether the column value actually changed. You can run the script that Listing 4 shows to create an UPDATE trigger on the Products table.

To test your triggers, run the following INSERT statements against the Products table:

INSERT INTO Products(productid, productname, package)
  VALUES(101, 'prod1', 10)
INSERT INTO Products(productid, productname, package)
  VALUES(102, 'prod2', 10)
INSERT INTO Products(productid, productname, package)
  VALUES(103, 'prod3', 10)

Wait a moment to simulate a period of time passing before changes are made, or you might not be able to see the difference in the effective dates of the attribute changes. Then run the following UPDATE and DELETE statements:

UPDATE Products
  SET package = 20
WHERE productid = 103
DELETE FROM Products
WHERE productid = 102

You can issue the query

SELECT * FROM Prod_log ORDER BY lsn

to verify that the triggers did their job correctly. Table 1 shows the correct contents of the Prod_log table after these modifications.

At this point, you're done with the source system infrastructure. The recording process runs automatically, so you don't need to worry about it. Everything else, including the creation of the dimension tables and the refresh process, happens in the destination data warehouse and in the DTS Package Designer.

Prev. page     1 [2] 3 4 5     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Nice article, but... In real life, the OLTP system is a 'given', which means that your are never allowed to create triggers and extra logtables on the production server. In real life, the OLTP system is often running on a different platform than Win/SQL. In real life, dimension tables tend to be quite large (+1 Mln records for a customer dimension is not exceptional). So in this situation, you cannot determine whether a source record is an insert, delete or update from a log table but first you have to match it against the target table. It would be interesting to see how you think DTS can help here. For instance, recently I had to develop a solution using DTS to keep Navision C/Side transaction tables (GL postings) in sync with a 'replica' on SQL Server. The only (?) way to accomplish this was using an ActiveX Script task.

Jos van Dongen

Excelent article. One of the best that I have ever read in SQL Server Magazine.

Anonymous User

Article Rating 5 out of 5

 
 

ADS BY GOOGLE