DOWNLOAD THE CODE:
Download the Code 25544.zip

Destination Data Warehouse
Next, you need to connect to the destination data warehouse and create three versions of the Products dimension table so that you can practice the three handling techniques. After connecting to the destination server, run the script in Listing 5 to create the Destination database and three versions of the Products dimension table.

The Products_type1 table is the simplest. It holds the product ID in the productid_app column and a surrogate key in the productid_key column. Note that in a Type 1 implementation, you don't necessarily need a surrogate key, but holding such a key in your table and using it as the correlating key in the fact table is a good practice for a couple of reasons. First, you accommodate future changes that might require a more complex type of handling. Second, the application key, which holds the original key from the source system, might be large, whereas the surrogate dimension key is of an integer data type, which is small. Keep in mind that the dimension key will be used as a foreign key in all rows of the fact table, which usually has a huge number of rows—in some systems, tens or even hundreds of millions. The second argument for using a surrogate key doesn't apply in the soft-drink scenario because the sample application key—the product ID—is also of an integer data type, but the first argument alone justifies adding a surrogate key.

Also note that uniqueness is enforced in both keys because you don't keep several versions of a product in a Type 1 implementation. The other columns in the Products_type1 table are productname, package, and discontinued—a bit column that shows 0 (false) if the product is active in the production system and 1 (true) if it was deleted from the production system. For suggestions about tracking discontinued products, see the sidebar "Discontinued Products."

The Products_type2 table holds the effective_date and to_date columns in addition to the columns that Products_type1 holds. Uniqueness isn't enforced on the productid_app column because, by definition, in a Type 2 implementation you can have several versions of products with one application key and different surrogate keys.

The Products_type3 table holds three versions of a product in the same row. Notice that the package and effective date columns are duplicated three times. You don't need separate effective_date and to_date columns because one version's "effective date" is the previous version's "to date." A Type 3 implementation has one row for each product, so you should enforce the uniqueness of both the production key and the surrogate key.

At this point, you're done implementing the infrastructure in the destination data warehouse. You can now move to the main task: creating the DTS package that implements the refresh process.

The Refresh Process
In the refresh package, you'll use several important DTS features, some of which are new to SQL Server 2000. You create a package that looks like Figure 1 by following the process I describe next.

Set the source and destination connection properties. Open the Properties dialog box of the connection called Source OLTP, then set the Server property to your source server and the Database property to the Source database. Open the Properties window of the connection called Destination DW, and set the Server property to your destination server and the Database property to the Destination database.

Make all the data-modification tasks part of one transaction. Making all tasks that modify data part of the same transaction ensures the atomicity of the refresh process. If a step fails, the package will roll back all activity. Only if the last step—the Execute SQL task called Clear Handled Trans—completes successfully will the transaction commit. Right-click each task (except the Connections and the Dynamic Properties tasks) and choose Workflow, Workflow Properties. Click the Options tab. Make sure that Join transaction if present and Rollback transaction on failure are selected for all tasks and that Commit transaction on successful completion of this step is selected only in the Execute SQL task. Also, make sure that the Microsoft Distributed Transaction Coordinator (MS DTC) service is active. This service manages the DTS transactions.

Create a global variable called max_lsn. Next, you create a global variable to store the current maximum lsn in the Prod_log table. The refresh processes that the DDQ tasks implement will go through the Prod_log table, reading all the transactions and performing modifications at the destination dimension tables. After the refresh processes are finished, the Execute SQL task will delete the transactions from the Prod_log table. Between the time the refresh processes finish and the time the Execute SQL task performs its cleanup, new transactions can enter the Prod_log table. Storing the current maximum lsn in a global variable before the refresh processes begin lets you limit the transactions that the refresh processes handle and that the Execute SQL task deletes, thus avoiding deletion of unprocessed transactions. To create the variable, right-click an empty area in the package, choose Package Properties, then click the Global Variables tab. Create a global variable called max_lsn that has an integer data type and a zero default value.

Use the Dynamic Properties task to store the maximum lsn in the global variable. The Dynamic Properties task is a cool new DTS feature in SQL Server 2000 that lets you dynamically change properties in the package during its run. You can use an .ini file, a query, a global variable, an environment variable, a constant, or a data file as the source for the changed property value. In this case, we'll use a query to retrieve the maximum lsn from Prod_log and store it in the max_lsn global variable. Open the Properties dialog box of the Dynamic Properties task called Store Max LSN in Var and click Add. Choose the max_lsn global variable from the left pane and click Set. Choose Query in the Source property, choose Source OLTP in the Connection property, and type the following query in the Query box:

SELECT MAX(lsn) FROM Prod_log

Click OK twice.

Delete the processed transactions from Prod_log. The next step is to write, in the Execute SQL task called Clear Handled Trans, a DELETE query to delete all the transactions that the refresh processes handled. Open the Properties dialog box of the Execute SQL task and make sure that Source OLTP is selected in the Existing Connection property. Enter the following query in the SQL statement box:

DELETE FROM Prod_log
WHERE lsn <= ?
Prev. page     1 2 [3] 4 5     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Nice article, but... In real life, the OLTP system is a 'given', which means that your are never allowed to create triggers and extra logtables on the production server. In real life, the OLTP system is often running on a different platform than Win/SQL. In real life, dimension tables tend to be quite large (+1 Mln records for a customer dimension is not exceptional). So in this situation, you cannot determine whether a source record is an insert, delete or update from a log table but first you have to match it against the target table. It would be interesting to see how you think DTS can help here. For instance, recently I had to develop a solution using DTS to keep Navision C/Side transaction tables (GL postings) in sync with a 'replica' on SQL Server. The only (?) way to accomplish this was using an ActiveX Script task.

Jos van Dongen

Excelent article. One of the best that I have ever read in SQL Server Magazine.

Anonymous User

Article Rating 5 out of 5

 
 

ADS BY GOOGLE