DOWNLOAD THE CODE:
Download the Code 5105.zip

DTS in SQL Server 7.0 helps you overcome troublesome warehousing tasks

Data warehousing means more than having a large database; it also includes the process you use to manage your data warehouses and data marts. To discuss this process, we'll organize data warehousing into three tasks: data acquisition, data transformation, and data presentation. Data acquisition is moving data from anywhere to your warehouse. After you acquire the data, the second task is data transformation: You scrub and validate the acquired data and then map the source data to a uniform set of values (i.e., you transform the data). You can add data to your warehouse only after you validate, transform, and verify the integrity of the data. The third task, data presentation, entails getting the cleansed data from the warehouse to users so they can easily access it.

Data Transformation Services (DTS) provides the framework and tools for developing, automating, and managing the first two tasks. In this article, we'll discuss how DTS carries out data acquisition and data transformation.

Data Acquisition
Without DTS, transferring data from the source systems into the warehouse is laborious because source data is often in different formats and runs on different platforms. You can acquire data in two ways: Purchase an expensive third-party tool that provides native drivers for accessing each target system and a scheduling utility to coordinate the transfers. Or you can resort to the lowest-common denominator by extracting data as an ASCII text file and using batch processing to import it into the target system. Most organizations choose the second option. Although this option appears simple, by the time you design, develop, and deploy the solution, it is as complex, difficult to maintain, and costly as the first option.

DTS uses OLE DB to tackle data acquisition. OLE DB is Microsoft's solution to enable Universal Data Access (UDA). OLE DB goes beyond ODBC's capability to access only relational data. OLE DB defines a set of COM interfaces that let you access and manipulate any data type. Acting as an OLE DB consumer, DTS extracts data from any data source that acts as an OLE DB provider (i.e., offers a native OLE DB interface) or any data source that offers an ODBC interface. For example, a company hosts its inventory tracking system in a DB2 database running on a mainframe, hosts its accounting system in an Oracle database running under UNIX, and stores its human resources data in a SQL Server database. The DTS Package Designer creates an OLE DB connection to each data source to pull all this information into a SQL Server warehouse. As part of the transformation process, DTS later maps the source data fields to destination data fields through the data source connections.

Data Transformation
In data transformation, you scrub, validate, aggregate, and summarize data after you acquire it and before it reaches its target storage location. Without DTS, this process consists of several steps. First, you must establish an intermediate holding area to store the source data, for example, staging tables in the database or storing text files on the file system. Second, you must develop applications to scan, analyze, and validate the intermediate data. Finally, you must develop additional routines to move the cleansed data from the holding area into the warehouse. Although these steps are necessary to ensure the integrity of the data, they increase complexity and add points of failure. The higher level of complexity and increased risk of failure result in higher manageability costs.

DTS' name comes from the data transformation task. DTS uses the Data Pump to simplify the data transformation process and reduce the number of failure points. The DTS Data Pump is a high-speed, in-process COM server that exposes the data stream for direct manipulation by any COM-compliant language, as Figure 1 shows. The Data Pump programmatically exposes the data stream for manipulation via ActiveX scripting languages. Developers can then validate, manipulate, aggregate, and redirect information.

Beyond the Data Pump, DTS exposes a complete set of COM interfaces that let developers create, maintain, and execute packages and let packages interact with external applications. This capability lets developers use any COM-compliant language to develop highly customized applications that use DTS' built-in functionality.

Workflow Support
The data acquisition task requires a means to schedule and monitor data extractions. In addition, the data transformation task requires flow control. What happens when a job fails? How do you automate your process and coordinate parallel and serial steps? How do you set up conditional execution (e.g., run job B only if job A's return code is 40)? Answers to these questions are as important as the data extraction and transformation jobs.

DTS uses built-in scheduling and workflow functionality to handle such scenarios. Each package contains one or more steps that define the flow and execution of tasks within the package. In addition, DTS uses precedence constraints to further define the workflow. Figure 2 shows an example of data transformation steps executed serially and in parallel based on precedence. Figure 3 shows data transformation steps executed conditionally. DTS' ability to control and manage its processing fills a void that looms over most enterprise warehousing projects.

Putting DTS to Work
To illustrate how DTS works, we'll use the Package Designer to script a package that simulates bringing multiple data sources together into one data store. This process is a common warehousing scenario.

Suppose a large corporation needs to combine the employee data from its two subsidiary companies into one data mart and refresh that data weekly. We'll use the Employee tables from SQL Server 7.0's Northwind and pubs sample databases to simulate the subsidiary companies. We'll create a new database called HRMart to represent the parent company's data mart. Listing 1 details the HRMart schema.

Connections
We must create a new DTS package to begin the database assignment. A package is a self-contained description of all the tasks required to complete an import, export, or transformation process. Enterprise Manager, which you see in Screen 1, provides the graphical interface for DTS package creation, management, and execution. To create a new package, we right-click the Data Transformation Packages folder, and click New Package. This action opens the DTS Package Designer, which provides tools to create and edit packages.

Next, we must define the source and destination data stores for our transformation. DTS, acting as an OLE DB consumer, uses connections to define the OLE DB data provider that will furnish the data for processing. Within the Package Designer, the Data toolbar and Data menu display the available connection types. To add a connection to the package, drag it from the Data toolbar to the design window. Or, you can right-click in the design window and click Add Connection. After you add a connection to the package, you must configure it before the package can use it. Configuring connection properties is similar to configuring an ODBC connection, as Screen 2 shows. For the first part of our example, we'll create two connections, one for the pubs data and another for the HRMart data.

Transformations
Now that we have established the data providers, we have to define the source and target data sets. You define the data sets in the Transform Data task. This task creates the Data Pump that will control the data stream and apply any transformations. To add this task, we must first select both the source and target connections. Then, we click the Transform Data icon (the arrow pointing to the right) or select Add Transform from the Workflow menu.

You must pay close attention to the order in which you select the connections when you add the Transform Data task. After you select the connections, you can't reverse the source and target connection without deleting and re-adding the task. To be safe, always select your source connection first and then your destination connection.

After we add the Transform Data task, we have to define the data sets. We right-click the task and select Properties. This action opens the Data Transformation Properties dialog box, which Screen 3 shows. On the Source tab, we define a query to return the employee data we need. To define the query, we directly enter our query in the SQL Query window or click Build Query to use the Query Designer. As the data source, we can use any valid SQL query such as a simple SELECT, VIEW, or stored procedure. No matter which method we use, we check our query for typos by clicking Parse Query. We're working with one source table; consequently, we could have selected the Table Name option and included the entire table in the query. After we define our query, we click Preview to view the Preview Data dialog box, which Screen 4 shows.

Next, we define the destination. For our example, the target table in the Table Name text box is HRMart.dbo.Employee. We don't make any changes to the Destination tab. To create a new destination table, we can click Create New. By default, DTS provides a destination table definition that matches the result set our source query defined.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I could not find Listing1 or Listing2 from the article.

Lisa Pai

Nice step-by-step initiation to DTS. I used DTS in mane cases and it demonstrated to be a useful tool. Thanks to SQLMag to leave this article on-line.

Marcos Galvani MCDBA(SQL2k) MCTS (SQL2k3) marcosgalvani@gmail.com

MarcosGalvani

Article Rating 3 out of 5

Thanks, Marcos. As you mentioned, we do have online archives for SQL Server Magazine articles, going back to the first issue--March 1999. I often see reader comments on older articles and am glad to know that you and other readers find them useful.

AnneG_editor

Article Rating 5 out of 5