DOWNLOAD THE CODE:
Download the Code 6110.zip

Establish the Project Environment. To begin, create a new C++ project. Choose File, New... from the menu. Then, switch to the Projects tab, and select the ATL COM AppWizard, add a Project Name (we used Xform), and click OK. The AppWizard then prompts you to confirm the type of object you want to add to the project. Because custom transformations run as in-process COM servers in the Data Pump, choose DLL (the default) as the object type. Click Finish, and then OK to confirm the object creation. The last step in establishing the project environment is adding ,_ATL_NO_UUIDOF (don't forget the comma!) to the Preprocessor definitions in the C/C++ tab of the Project, Settings dialog box. You need to add this definition because of a minor version incompatibility between the SQL Server DTS files we use and Visual C++ 6.0. (For a more complete explanation, refer to the Microsoft Knowledge Base article Q192561 at http://support.microsoft.com.) Now that you've established the project environment, you're ready to add the Custom Transformation objects and interfaces.

Add Objects and Interfaces. To use the Custom Transformation ATL Add-In Object, choose Insert, New ATL Object... from the menu. Then, scroll to the bottom, select the Custom Transformation Object, as Screen 1 shows, and click Next. On the Names tab, fill in the C++ Short Name (we used XMLDoc; everything else defaults). To support error handling, switch to the Attributes tab, enable Support ISupportErrorInfo, and click OK. (Building verbose error-handling code that COM can use to pass error information to the DataPump is beyond the scope of this article. Send us email if you want to see an article on it.) The AppWizard creates a default Custom Transformation object, CXMLDoc, which is inherited from IDTSDataPumpTransform, and declares all the interfaces necessary to implement the transformation. WebSQL subscribers can download WebSQL Listing 1 at http://www.sqlmag.com, which contains the complete header file for the transformation.

Customizing the Transformation
All transformations need to support the IDTSDataPump-Transform interface. Figure 3 shows the properties and methods of this object. Because it inherits this interface, the Data Pump controls the transformations. The Custom Transformation ATL Add-In Object handles all these declarations, letting you concentrate on the specific functionality of the code. In the transformation, this specialization begins by establishing three properties (m_TableName, m_RowName, and m_XSLPath) and the member variable m_WrittenHeaderInfo. The properties are implemented as standard C++ GETS and PUTS. When you implement a package at design time, transformation properties that implement a GET method are exposed for input. In this example, m_WrittenHeaderInfo is private to the transformation; the package designer exposes the others.

Now, you're ready to begin adding custom code. The Initialize method fires only once, when the object is first instantiated. Here, you can establish the object's state. This example creates an .XML file for output. Before you write out the individual data rows, you need to create a header record to define the location of the .XSL file. (An XSL file is the type of style sheet XML uses to define the data layout and data formatting.) Listing 1 contains the code in the .XSL file. In the Initialize method, set the m_WrittenHeaderInfo flag to FALSE, signifying that the header needs to be written.

The AddVariable method fires next in the transformation's execution sequence. This method's purpose is to let application (package) variables pass through to the transformation process. You don't have any variables to pass here, so leave the method empty.

The next method to fire is ValidateSchema, which validates the source and destination columns for consistency with the Execute method's expectations. Based on the output from this method, the Data Pump determines whether to call the Execute method. Because the destination data store is an XML file, the first validation rule checks that the destination data store defines only one column. The second rule validates that the destination column is a string data type. The final rule this method performs confirms that each source column is a string data type. If any rule fails, the return code becomes DB_E_SCHEMAVIOLATION, and the Data Pump terminates the transformation. If the schema is valid, the return value is NOERROR and processing continues.

After the ValidateSchema method completes, the Execute method runs. This method performs the data transformation, firing once for each data row in the data source. As you might guess, this method does the bulk of the processing work. Data validations and data manipulations occur here. In the example, you create the XML strings for the file here. After establishing local variables to store the data, determine whether the header has been written. If not, the custom transformation creates the header and adds it to the destination string. One limitation of the Data Pump is that it handles only one data source and one data destination for each conversion. So, for example, you can't create or modify a corresponding XSL file to manage the way an XML document displays in an XSL file. Also, this limitation implies that the transformation has only one data row as input and one row as output and thus directly affects your custom transformation. As a workaround for this problem, the transformation creates the destination string as a single string value. It concatenates the various tables and data, and separates multiple lines of XML data by inserting carriage returns and line feeds into the destination stream until the entire source row is processed. Then, the transformation sends the string to the destination data store (in this case, the XML file). This processing continues until an error occurs or until all the source data is exhausted.

The remaining two methods act as events. The OnRowComplete method fires after each successful fetch operation. The OnTransformComplete method fires after all rows have been transformed and after the Data Pump terminates. You might use it to free up memory allocations that are introduced throughout the transformation, but we didn't use either of these interfaces in this project.

Another item to note about this XML custom transformation is that at package design time, you need to provide code that adds the final XML tag to the output file. So far, the Execute method has accounted for all the formatting of the XML file. However, because of a design weakness in the combination of Data Pump and custom transformation, a custom transformation can't tell when to write the final XML tag to the destination output file. No event fires after all rows have been transformed but before the transformation loses access to the destination data stream. The Data Pump hides all the complexity of data management (pulling data from the source and sending it to the destination), so the transformation can't know when it is processing the last data row from the source connection. It knows only when an individual transformation is complete (via the OnTransformComplete method). After the Data Pump processes all the source data, it closes the destination data stream before calling OnTransformComplete. As we discuss implementing the package later in the article, we describe one way to work around this limitation. With that, the transformation is complete and ready for compilation.

Using the Transformation
Now that you've built the transformation, put it to work. First, create a new package. Select the Data Transformation Services folder for your server in Enterprise Manager, then right-click and select New Package. Although you registered custom tasks in the Package Designer, you simply register the transformation component with regsrv32 to make it available to any Data Pump or Data Driven Query task you might use in your package. To handle registration, we enabled the ATL Add-In to generate the COM component and SQL Server Registry information for you. For details, see the DLLRegisterServer() and DLLUnregisterServer() methods in Xform.cpp. (This file is part of XForm.zip, which we mentioned earlier.)

Returning to the example, add a data connection to the Pubs database. Screen 2 shows the Pubs connection. Next, define your destination connection. Because the XML output file is nothing more than a specially formatted text file, use the Text File (Destination) data source, which the Package Designer offers. Direct the output data to a file called \Article\Authors.xml, as in Screen 3. Formatting of the output file takes place in the custom transformation. Therefore, in the Text File Properties dialog box, you need to ensure that the fixed field option is enabled and that the other options remain at their default values, as Screen 4 shows.

Now use a Data Pump task to add the transformation. First, select the Pubs connection you added. Next, select the XML File connection, and choose Workflow, Add Transformation.

After you add the Transform Data task, you need to define the data sets. Right-click the task, and select Properties. This action opens the Data Transformation Properties fialog box. This example pulls data directly from the Authors table and places it into the XML file. On the Source tab, choose the Table Name option and select the Authors table. You can then define the destination file. The custom transformation handles all the string concatenation and special XML formatting issues, so the destination file will have only one output column. After you switch to the Destination tab, DTS prompts you to define the columns of the output file. Click Define Columns, then set the fields on the pulldown menu on the resulting dialog box. By default, the list includes all rows from the source connection. For this example, override the default, setting all the fields except one to ignore, as you see in Screen 5. Rename the remaining column to XMLRow, change its type to not quoted, and ensure that it is large enough to include all the data from the data row, plus the XML tags. (We chose a size of 300 characters for the example.) You can see the resulting output file definition in Screen 6. And to avoid cluttering the Execute routine with error-processing code, the example detects only buffer overruns on length.

Now that you've defined the source and destination data sets, define the transformation. You map the columns on the Transformations tab of the Data Transformation Properties dialog box, which Screen 7, page 45, shows. By default, DTS maps each source column to a destination column in column order. Before you add the transformation, you must delete the default column mappings: Select the transformation line between the source and destination tables, then delete it. Next, redefine the mapping by selecting au_lname, au_fname, address, city, and state in the source table and their corresponding columns in the destination table (in our example, the destination column is equivalent to one XML row). After selecting the columns, expand the New Transformation drop-down list box, select the transformation type XMLDoc Class, and click New to create the custom transformation. At this point, the custom transformation prompts you to define its properties, as in Screen 8. These properties, which you defined earlier in the custom transformation, are case sensitive, so be careful.

The final piece of the package accounts for the custom transformation's inability to identify the end of the data stream and the resulting lack of a final XML tag. To fix the problem and write the tag, simply add the ActiveX Script code from Listing 2 to the package. And to ensure that the task executes only after the transformation successfully completes its job, add an On Success precedence constraint to the workflow.

Tying It All Together
You can see the completed package in Screen 9. Running the package generates the XML file Authors.xml. When you combine this file with the Authors.xsl style sheet and use Internet Explorer 5.0 to open the result, you will see the nicely formatted Web page that Screen 10 shows.

Custom transformations provide another powerful way to extend the capabilities of DTS. Even with their limitations, custom transformations offer developers a significant advantage over the standard ActiveX Script transformations in maintainability, functionality, and performance.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

It is a nice article, but the sample code don't work with SQL Server 2000.

The method CXMLDoc::Execute didn't write in the destination column. To fix this error you must set obStatus of destination binding to DBSTATUS_S_OK, like

DWORD* status = (DWORD *) ( rDTSDestColumnData.pvData + rDBDestBinding.obStatus ); *status = DBSTATUS_S_OK;

before you change the content of obValue.

Jürgen Meister

 
 

ADS BY GOOGLE