• subscribe
April 19, 2005 12:00 AM

Rev Up Performance with Custom Connectors

SQL Server 2005 Integration Services extensibility lets you solve uncommon problems
SQL Server Pro
InstantDoc ID #45801
Downloads
45801.zip

Execute. The third and final step is to provide the output we want by implementing the PrimeOutput method that Listing 4 shows. SSIS has created a buffer for each of the outputs we said we'd provide, and it passes those buffers to the PrimeOutput method as an array. Because this example produces only one output, our array has only one buffer element. The PrimeOutput method fills that buffer with data and closes it.

Because we opened the file handle in the previous step, the component has only to read each line of the file and parse the needed information. An IIS log file has a header section at the top, so we'll skip any lines that start with the # character that identifies that header. The data fields of the log-entry lines are separated by spaces, which makes for simple parsing. If the data doesn't contain any spaces, the parsing won't work, but the test files we used for this example didn't have any problems.

Once the component has parsed the data and is ready to add it to the buffer, the component calls the AddRow method and sets each of the values. Listing 4 includes a precautionary step to truncate the field data at the size we defined for the output column. This step prevents an overflow that would result in an error when the package runs. Because IIS administrators can customize IIS log files to include extra information (and URIs can get pretty long), this is a good safety feature even if you're reasonably confident about your file formats.

Finally, when Listing 4's code reaches the end of the log file and has written all the data to the buffer, the code calls the SetEndOfRowset method on the buffer to tell SSIS that it's finished. The code can close the component and clean up the buffer.

Other considerations. Although this is a simple example, making the implementation production-ready requires a little bit more work. Microsoft says that you can't assume at the PrimeOutput step that your columns are in the same order that you put them in the ProvideComponentProperties step. The buffer manager reserves the right to add placeholders for extra columns, which might get merged later into the data flow; your component wouldn't necessarily know about these extra placeholders. The buffer manager might also rearrange your component's columns to better fit on your memory pages. Microsoft suggests that instead of relying on the column order being what you defined, you should implement the PreExecute method and use the buffer manager's FindColumnByLineageID method to locate your columns and build an array that you can use in PrimeOutput to look up column indexes by name. We condone this practice, but in the interest of space and simplicity, we haven't shown it in the example.

Installing the Component
Installing your component so that the Business Intelligence Development Studio can use it is easy. First, you must copy the assembly you created to the folder where SSIS pipeline components reside. Then, you need to add the assembly to the Global Assembly Cache (GAC). Finally, you can add the component to the Development Studio toolbox, where a user can select it. Every time you change the component you have to copy the file, uninstall the previous version from the GAC, and install the new version of the component to the GAC, so we suggest you create a batch file to do this for you. Listing 5 shows a sample batch file that we use on our development system.

To add the component to the toolbox, open the Tools menu and select Choose Toolbox Items to open the dialog box that Figure 1 shows. Your component should appear on the Data Flow Items tab of the dialog, as Figure 2 shows. Select your component and click OK, and it appears in the toolbox in the Data Flow Sources section.

To upgrade your component, simply close any solution using the component, run the installation batch file again, and reopen the solution. The solution should automatically receive the new version of the component. The only time this process didn't work for us was when we changed the component's class name. This change caused the Business Intelligence Development Studio to recognize the component as a different component (albeit with the same name). We had to delete the old component from the toolbox and add the new one. We also had to delete the component from any projects and add it again.

Building a Test Package
To test the sample component, create a new Data Transformation Project in the Business Intelligence Development Studio. You'll start with a blank Control Flow surface to which you need to add a Data Flow Task. Double-clicking this task will bring you to a blank Data Flow surface. Drag your new source component from the toolbox and drop it on the surface.

Next, you need a File Connection to tell your component where to find the log file you want to import. Right-click the Connections section at the bottom of the Data Flow surface, and select New File Connection. Be sure not to select New Flat File Connection, which is a different connection intended for the Flat File Source component. For the example file connection, the usage type should be set to Existing File; then you can add the IIS log file you want to import. IIS log files typically reside in the C:\WINDOWS\system32\Logfiles path on a system that has IIS installed. Obviously, your Web server would need to have some content and traffic to generate logs. Alternatively, you could easily mock up some sample files based on the file format descriptions above.

Now you need to add your new connection to your component. Right-click the example component and select Edit from the context menu. The editor will appear, and the first tab you see should be called Connection Managers. The connection your component requested in ProvideComponentProperties should be there with a drop-down box that lets you select your newly created File Connection.

Finally you need a destination. When testing source components, we love using the DataReader Destination. This no-fuss destination lets you quickly and easily test your component. Of course, you could also send the data to SQL Server or Excel or anywhere else by selecting a different destination, but other destinations typically need additional configuration. To use the DataReader Destination, simply drag the output from your source and drop it on the destination. Then, edit the destination and select the columns from your output so SSIS doesn't complain about unused columns. We always put a Data Viewer on the connection, as Figure 3 shows, so that we can see that the source is working; simply right-click the connection, select Data Viewers, and add one.

Your package should now be ready to run. Click the green arrow to compile and execute in the debugger, and your Data Viewer should pop up, displaying data from your IIS log file in its grid. For more information about developing custom source components, see the "Creating a Source Component" topic in the SQL Server 2005 Books Online.

Build Your Own
Perhaps you have a unique data extraction or transformation task that would benefit from a custom data connector. As you dig into SQL Server 2005 and try your hand at building new SSIS packages to perform simple or complex data movement, look for a scenario that leaves you scratching your head trying to find the right SSIS source adapter or an area in which you'd like to get a performance gain by consolidating transformation logic into your extraction. Such a situation could be the perfect chance to build a custom source component.

Related Reading
KIRK HASELDEN
"What's New in DTS?" May 2004, InstantDoc ID 42141
"Making Package Magic," October 2004, InstantDoc ID 43805
"Editing a Package With SQL Server 2005 Integration Services Designer," March 2005, InstantDoc ID 45092





ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here