Task 8: Setting the database properties. Set Database Properties is an ActiveX Script task that initializes database level settings by calling the sp_dboption stored procedure. One of the database options settings worth noting here is bulkcopy, which the ActiveX script code sets to true. Bulkcopy's true setting lets the data load faster because it means that SQL Server doesn't log row-insert operations. However, be aware that for nonlogged bulk load to work, your database settings must meet additional conditions. These conditions are well documented in SQL Server Books Online (BOL).
Task 9: Initializing FoodMart's connections. The Initialize Food-Mart Connections task initializes FoodMart's SQL Server OLE DB connection and the parameters required for the Execute Package task. Figure 11 shows the General tab in the Dynamic Properties Task Properties window. You've already set the OLE DB properties, so let's set a task parameter. Clicking Edit on the General tab and highlighting the PackageGuid destination property opens the Package Properties window, which Figure 12 shows. In this window, you can select the task, the PackageID, and the PackageID's default value. Once again, the Dynamic Properties task gives you maximum flexibility for configuring a property at runtime, a capability that's vital when you move a package between environments.
Task 10: Creating tables. After you choose the package properties, you can create 24 database tables and populate them. Note the size of the FoodMart databasetoo large to use Access's Upsizing Wizard. FoodMart holds enough data to warrant the explicit creation of the database schema to optimize the final database size. The script that creates the tables is available online at http//:www.sqlmag.com (for download instructions, see "More on the Web," page 54). Also available online is a readme.txt file that tells you how to run the Create Foodmart 2000 package. Your next step is to run the initial load.
Task 11: Moving data from Access to SQL Server. Many ETL projects are complex enough to warrant the separation of logic into multiple packages. When you use SQL Server 7.0, linking these multiple packages together in a workflow is a challenge. The technique commonly usedcreating an EXECUTE process command, then using the dtsrun command-line interfaceis a cumbersome solution. In addition, in SQL Server 7.0 you can't set runtime parameters.
SQL Server 2000 addresses both shortcomings with a new task, the Execute Package task. You use this task to invoke the DTS package that moves data from Access to SQL Server. I examine the package in more detail later in this article. First, let's look at the General tab in the Execute Package Task Properties window, which Figure 13 shows. Task 9 sets the key values for this package at runtime. The window in Figure 12 displays the available properties. Be aware that for all tasks, the minimum properties you need to set are the PackageName, the package FileName, and the PackageGuid so you can dynamically set the package properties to work correctly at runtime.
The Execute Package task incorporates another valuable feature: You can initialize the called package from the task in the Execute Package Task Properties window. To initialize the called package, you can choose either the Inner Package Global Variables tab or the Outer Package Global Variables tab, which Figure 14 shows. For this example, I used Outer Package Global Variables to initialize global variables of the same name within the called package. Figure 15, page 54, shows the called package that you use to copy the data from Access to SQL Server. This package uses a technique similar to the initialization technique that the main package uses. After the initialization task completes, each of the 24 transformation tasks fire and complete independently of one another.
In each transformation, you map the source to the destination column. DTS refers to this action as the data pump. Figure 16, page 54, shows the transformations for the account table in the Transform Data Task Properties window, Transformations tab. You can set one transformation for the entire row, as Figure 16 shows, or map the table column-to-column, as Figure 17, page 54, shows. You might expect that minimizing the number of transformations would significantly speed up the copy task's performance. However, my SQL Server Profiler tests showed that the timing results are similar for both packages. One of the test runs revealed that both techniques use the BULK INSERT command to transfer information to SQL Server. BULK INSERT used as a default command is another new SQL Server 2000 feature. When you use BULK INSERT capabilities, you can greatly improve execution time for your transformation tasks. However, this performance gain comes at a cost: Inserting data in bulk mode doesn't work with the new SQL Server 2000 logging features.
To understand the problem, let's look at Figure 18, page 55, which shows the Options tab for one of the transformations. Note that the Use fast load option is enabled by default for a copy transformation. Disabling this feature changes the method of loading the destination data rows from a nonlogged, bulk-load interface to a logged interface. The quick Profiler timing tests I ran on my machine show that the task runtime is more than 10 times longer when you disable Use fast load. However, when you run a transformation with Use fast load enabled, you can't take advantage of one of the new SQL Server 2000 logging features, which lets you save copies of all rows that fail during the transformation. This logging feature is valuable because it lets you log and later process all failing rows for a particular transformation. ETL processing often requires you to make choicesand a trade-off accompanies every choice. Here, you must decide between set-based processing and row-based processing when you build your transformations. Set-based processing usually provides better performance, whereas row-based processing gives you more flexibility. I use set-based processing in the next two tasks, in which I cleanse the data and create primary keys and referential integrity.
Prev. page
1
2
[3]
4
next page