SideBar    Preparing for the Wizard

The Migration Wizard Output
The reason DTS-to-SSIS migrations pose challenges goes beyond the fact that SSIS is a new product. Fundamentally, DTS and SSIS have different architectures. By settling for just a migrated package without then extending or rewriting parts of the package to take advantage of SSIS functionality, you’re settling for less. To be sure, you’d see immediate manageability advantages, but when it comes to performance and scalability, DTS architecture limitations would still be evident.

Most DTS packages use an Extraction, Loading, and Transformation (ELT) model of data processing—a flip from true Extraction, Transformation, and Loading (ETL). In other words, data processing logic in DTS often relies on SQL Server to perform the business transformation logic, and to use SQL Server, the data first needs to be loaded into tables. This type of process can be relational database management system (RDBMS)-intensive. TSQL can provide valuable support in an SSIS-based ETL process, particularly for set-based operations, but when it comes to bulk operations and data transformations, DTS tends relies too heavily on SQL Server logic and is therefore often bound by disk I/O bottlenecks and synchronous processing. For example, consider this typical DTS package: Data is extracted from a flat file source where it’s landed to a staging table; a SQL Server update adds key relationships and repairs missing values; finally, the staged data is joined to a production table and loaded to the destination.

In this typical package, which Figure 3 illustrates, the extraction needs to be complete before performing the update, which then needs to be complete before the loading processes. You might have similar packages or even more complicated packages with many staging and SQL Server precedence requirements. Perhaps you even have hundreds of packages that perform operations that are similar to this example.

If you haven’t yet tested the DTS Migration Wizard, you might be wondering how much faster your DTS packages will perform after you migrate them to SSIS. The answer is that SSIS packages created by the DTS Migration Wizard most likely won’t perform any faster than the original DTS package. If you’ve already performed a few tests, you might still be wondering why your migrated DTS packages don’t perform much better in SSIS.

The answer is in the package architecture. When you run the DTS Migration Wizard, the output essentially generates a Control Flow–centric (or workflow-centric) SSIS package that contains Data Flows with minimal to no transformations. If your DTS packages were workflow packages that coordinated the execution of SQL Server tasks or copy-column data pumps, your migrated packages in SSIS will have the same architecture, relying on the underlying staging tables and RDBMS engine to perform the core of your logic. In both cases, DTS and SSIS are simply providing the coordination of the data copies and SQL Server–based logic. Similarly, if your DTS packages leveraged any transformation in the data pump, those data pumps would migrate as an Execute DTS 2000 Package task and therefore still run under the DTS runtime engine.

The DTS package that Figure 4 shows, when run through the DTS Migration Wizard, generates several Data Flows and a couple of Execute SQL Tasks, matching directly to the DTS Data Pumps and DTS Execute SQL Tasks, respectively. In this simple example, the DTS package runs in almost exactly the same time frame that the migrated SSIS package runs. This behavior is expected: Both packages execute the same steps in the same order and rely on the underlying relational engine to perform the transformation logic in an ELT-type process.

Although performance is an important consideration, it’s only one of the many criteria to look for in an ETL tool. From the start, it will be easy to apply the new SSIS feature set to your migrated DTS packages. Many of these new features don’t necessarily relate to performance but still provide great improvements in the areas of development administration and package control—for example, package configurations to share properties and connections, checkpoints to enable restartability, source control and debugging integration with Microsoft Visual Studio, environment-deployment tools, offline capabilities to ease development, conditional and logical-OR precedence constraints, built-in looping and Windows Management Instrumentation (WMI) integration, and better backend Analysis Services and SQL Server 2005 support.

Rethinking Your Data Processing in SSIS
When it comes to performance and data processing, SSIS can do much more than DTS while providing better scalability. However, when just relying on the output of the DTS Migration Wizard, SSIS’s core data-processing features don’t come into play (because of the nature of a DTS migration). This includes the SSIS Data Flow’s pipeline engine, which provides data transformations, cleansing, and performance beyond the capabilities of DTS’s staging and SQL Server–centric model. A few of the SSIS data flow advantages are:

  • In-memory data association of heterogeneous sources using merge, union, and lookup capabilities
  • Text and data mining to provide insight into data relationships and text-field contents
  • Grouping and sorting for data aggregations and ordering
  • Data routing and duplicating, which allow multiple destinations and filtered data inserts
  • Data cleansing to match data based on similarity
  • Additional adapters that integrate XML, binary, SQL Server Mobile, OLE DB, ADO.NET, and other sources and destinations

When you apply these advanced Data Flow features to your migrated DTS packages, you’ll unlock the muscle of SSIS. For example, suppose we redesign the aforementioned SSIS package to take advantage of the Data Flow in SSIS. Figure 5 shows a single Data Flow to handle the same logic present in the original DTS package and initial package following migration to SSIS.

This redesigned package uses the memory-based Lookup, Merge Join, Aggregate, Sort, and Derived Column transformations. This redesign reduces the synchronous steps involved in the prior architecture, and because of the memory-based architecture of the SSIS pipeline, also reduces the disk I/O. Overall, this redesigned sample package runs about 40 percent faster than the migrated package. Also compelling is that when you track the Physical Disk counters, the redesigned package uses 50 percent less disk I/O, and in fact also reduces the average processor utilization from 35 percent to 25 percent, allowing more headroom for other processes and packages to run on the same server.

So, are data-staging tasks no longer required? In many cases, staging data isn’t necessary, but a few situations merit their use. For example, staging can help fulfill data-validation requirements, provide a point-in-time capture of a volatile source, or help align sources in which the extraction times don’t overlap. SQL Server can also provide valuable functionality with set-based updates or deletes, recursive operations, and ETL auditing and administrative tasks. When deciding on staging data in your migrated packages, ask, “Which staging or SQL Server steps were used because of the DTS based architecture?” and “For these steps, how can I leverage the SSIS Data Flow?” It’s true, however, that the net effect will be a reduction in staging and sequential SQL Server tasks and lead to an overall increase in server scalability and a reduction in processing times.

Prev. page     1 [2] 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

This InstantDoc

For a helpful primer, see “Step Up to SQL Server 2005,” InstantDoc ID 47740, as well as this article’s sidebar, “Your First Step.”

Points to this article

[September 15, 2005] Make the MTA Go Away By: Paul Robichaux Exchange & Outlook UPDATE: Exchange Server Perspectives InstantDoc #47740 Web Exclusive from Windows IT Pro

Dysonm71

Article Rating 2 out of 5