Getting fast, functional performance is as easy as C++
Whether you use wizards to develop a simple package or you need the flexibility of a custom package, Data Transformation Services (DTS) lets you tailor solutions to the situation. In "Constructing DTS Custom Tasks" (September 1999), we examined how custom tasks increase a package's execution speed and help reduce its complexity by providing a handy container for utility tasks. This month, we expand on the extensibility theme by examining the role and development of custom transformations.
Like custom tasks, DTS custom transformations offer a significant reduction in execution speed and can be completely customized. Using an extension to Visual C++ that we created (and that you can reuse), we build a custom transformation.
In this example, we use the Active Template Library to build a transformation that generates Extensible Markup Language (XML) documents from database tables. (XML is an important technology for data interoperability, and its presence is growing in all aspects of distributed computing.)
What Is a Transformation?
Before you build a custom transformation, you need to understand transformations. A transformation is the process of modifying data after its extraction and before it reaches its target storage destination. Note that this definition doesn't address where the data originates, where it ends up, or how the data travels. Instead, this definition focuses on how the data is modified or transformed during its travel between the data stores. This distinction is important, because it directly determines what you can and can't do with your custom transformation.
DTS transformations take place in the Data Pump, which Figure 1 illustrates. The Data Pump is a multithreaded, high-speed, in-process COM server that programmatically exposes the data stream for direct manipulation. The Data Pump hides the complexity of data movement, letting developers focus on manipulating row sets. We talk more about the Data Pump and its role in the transformation process later in this article.
By default, the Data Pump supports only two types of transformations: a Copy Column transformation and an ActiveX Script transformation. In April and May, we developed packages with the DTS Package Designer and programmatically with Visual Basic (VB) to demonstrate ActiveX scripting and the Copy Column transformations. In some situations, though, these two types aren't flexible or robust enough to handle the complexity of your transformation. To address this limitation, custom transformations give you a mechanism to extend these basic transformations: using C++ to create new transformations.
Why Custom Transformations?
To discover where and when custom transformations might fit in your solution, let's examine their maintainability, functionality, and performance. All three attributes are interrelated, but each has a distinct impact on the decision process.
Maintainability. You implement custom transformations only via C or C++. Microsoft's reason for specifying these languages is speed; because transformations are processed one row at a time, the performance overhead in calling an automation interface once for each row is too overwhelming if you implement them with any language other than C or C++. Therefore, Microsoft restricted custom transformation development to the Data Pump's native OLE DB interfaces as defined in dtspump.h. Because many companies don't have C or C++ developers, this restriction often eliminates custom transformations from any proposed solution.
Although you need the right staff to maintain custom transformation code, you gain significant maintenance advantages when you use C++. For example, complicated algorithms are often easier to express in C++ than in script. Additionally, you have a much greater opportunity to reuse C++ code than VBScript.
Functionality. Of the three categories, functionality is the strongest reason for using custom transformations. Although scripting languages have evolved into handy tools, they don't begin to match the functionality that C and C++ offer. Compare VBScript to C++ as an example. VBScript has a limited number of intrinsic functions, can access COM objects only through automation interfaces, and supports only the variant data type. Most notably, VBScript can't call external DLLs (including the Windows APIs).
Performance. Your choice of development tool affects application performance. Your choice of development language also directly affects the performance of your transformation. However, the component interaction within a task, and consequently within a transformation, most directly influences performance. The best way to illustrate this point is to examine the layers of abstraction, which Figure 2 shows, that are involved in implementing a package. All packages perform their work through tasks. Transformation tasks (Data Pump tasks or Data Driven Query tasks) call the Data Pump for processing. The Data Pump calls a transformation to perform the data manipulation. If an ActiveX script performs the transformation, the transformation must pass control to the scripting language.
At the Data Pump level and above (tasks, connections, etc.), each component generally is called only once within a package. However, the Data Pump calls the transformation row by row. That is, all objects or interfaces in the transformation code need to be instantiated each time a row is processed. As you might guess, the performance impact of using an ActiveX Script to process a large number of rows can be substantial. In addition, certain scripting functions, such as string manipulation, are inherently slow. Any transformation that performs many string operations (as when data is copied from source to destination columns) is very slow if you write it in script. This slowdown is further magnified if the transformation moves a large amount of data; in data-warehousing scenarios, transformations commonly move several hundred thousand rows.
Building a Custom Transformation
Consider the following example, which demonstrates how to use a custom transformation. On a recurring basis, you need to pull data from an external data source and use XML to display that data in Internet Explorer (IE) 5.0. For your solution, you decide to use DTS to extract the data and a custom transformation to transform each row into an XML-formatted string that you output to a text file.
For this demonstration, we used the Active Template Library (ATL) to create a transformation. To maximize reusability and save development time, we created an ATL Add-In object module for a Custom Transformation object. (WebSQL subscribers can download the files for this custom transformation from the article index on the SQL Server Magazine Web site, http://www.sqlmag.com.) When you execute this add-in, it creates a default DTS Transformation object and its interfaces. (Later in this article, we'll describe the object and each of the interfaces in detail.)
First, install and register the Custom Transformation ATL Add-In by unzipping the contents of XForm Object.zip into Program Files\Microsoft Visual Studio\Common\MSDev98 \Template\ATL and running the Registry file, CustomXForm.reg. In addition, you need to define the SQL Server Include directory MSSQL7 \DevTools\Include in your development environment. (You might have created this directory when you installed the development tools from the SQL Server CD.) Note that all the code that this add-in creates (beyond what most ATL objects need) has a comment denoting that DA/BL (the authors) added it. This marker gives you a searchable string to identify what we added.
Prev. page  
[1]
2
next page