The Migration Wizard Output
The reason DTS-to-SSIS migrations pose
challenges goes beyond the fact that SSIS is a
new product. Fundamentally, DTS and SSIS
have different architectures. By settling for just
a migrated package without then extending
or rewriting parts of the package to take advantage of SSIS functionality, you’re settling
for less. To be sure, you’d see immediate manageability advantages, but when it comes to
performance and scalability, DTS architecture
limitations would still be evident.
Most DTS packages use an Extraction,
Loading, and Transformation (ELT) model of
data processing—a flip from true Extraction,
Transformation, and Loading (ETL). In other
words, data processing logic in DTS often
relies on SQL Server to perform the business
transformation logic, and to use SQL Server,
the data first needs to be loaded into tables. This
type of process can be relational database management system (RDBMS)-intensive. TSQL
can provide valuable support in an SSIS-based
ETL process, particularly for set-based operations, but when it comes to bulk operations
and data transformations, DTS tends relies too
heavily on SQL Server logic and is therefore
often bound by disk I/O bottlenecks and
synchronous processing. For example, consider
this typical DTS package: Data is extracted
from a flat file source where it’s landed to a
staging table; a SQL Server update adds key
relationships and repairs missing values; finally,
the staged data is joined to a production table
and loaded to the destination.
In this typical package, which Figure
3 illustrates, the extraction needs to be
complete before performing the update,
which then needs to be complete before
the loading processes. You might have similar
packages or even more complicated packages with many staging and SQL Server precedence requirements. Perhaps
you even have hundreds of packages that perform operations that
are similar to this example.
If you haven’t yet tested the DTS
Migration Wizard, you might be
wondering how much faster your
DTS packages will perform after
you migrate them to SSIS. The
answer is that SSIS packages created
by the DTS Migration Wizard most
likely won’t perform any faster than
the original DTS package. If you’ve
already performed a few tests, you
might still be wondering why your
migrated DTS packages don’t perform much better in SSIS.
The answer is in the package
architecture. When you run the DTS Migration Wizard, the output essentially generates
a Control Flow–centric (or workflow-centric) SSIS package that contains Data Flows
with minimal to no transformations. If your
DTS packages were workflow packages
that coordinated the execution of SQL
Server tasks or copy-column data pumps,
your migrated packages in SSIS will have
the same architecture, relying on the underlying staging tables and RDBMS engine to
perform the core of your logic. In both cases, DTS and SSIS are simply providing the coordination of the data copies and SQL
Server–based logic. Similarly, if your DTS packages leveraged any transformation in the data pump, those data pumps would migrate as an Execute DTS 2000 Package task and therefore still run under the DTS runtime engine.
The DTS package that Figure
4 shows, when run through the
DTS Migration Wizard, generates
several Data Flows and a couple
of Execute SQL Tasks, matching
directly to the DTS Data Pumps
and DTS Execute SQL Tasks,
respectively. In this simple example,
the DTS package runs in almost
exactly the same time frame that
the migrated SSIS package runs.
This behavior is expected: Both packages execute the same steps in the same
order and rely on the underlying relational
engine to perform the transformation logic in
an ELT-type process.
Although performance is an important
consideration, it’s only one of the many criteria to look for in an ETL tool. From the
start, it will be easy to apply the new SSIS
feature set to your migrated DTS packages.
Many of these new features don’t necessarily
relate to performance but still provide great
improvements in the areas of development
administration and package control—for
example, package configurations to share
properties and connections, checkpoints
to enable restartability, source control and
debugging integration with Microsoft Visual
Studio, environment-deployment tools,
offline capabilities to ease development,
conditional and logical-OR precedence
constraints, built-in looping and Windows
Management Instrumentation (WMI) integration, and better backend Analysis Services
and SQL Server 2005 support.
Rethinking Your Data Processing in SSIS
When it comes to performance and data
processing, SSIS can do much more than DTS while providing better scalability.
However, when just relying on the output
of the DTS Migration Wizard, SSIS’s core
data-processing features don’t come into
play (because of the nature of a DTS
migration). This includes the SSIS Data
Flow’s pipeline engine, which provides data
transformations, cleansing, and performance
beyond the capabilities of DTS’s staging and
SQL Server–centric model. A few of the
SSIS data flow advantages are:
- In-memory data association of heterogeneous sources using merge, union,
and lookup capabilities
- Text and data mining to provide
insight into data relationships and text-field contents
- Grouping and sorting for data aggregations and ordering
- Data routing and duplicating, which
allow multiple destinations and filtered
data inserts
- Data cleansing to match data based on
similarity
- Additional adapters that integrate XML, binary, SQL Server Mobile,
OLE DB, ADO.NET, and other
sources and destinations
When you apply these advanced Data
Flow features to your migrated DTS packages, you’ll unlock the muscle of SSIS. For
example, suppose we redesign the aforementioned SSIS package to take advantage
of the Data Flow in SSIS. Figure 5 shows a
single Data Flow to handle the same logic
present in the original DTS package and
initial package following migration to SSIS.
This redesigned package uses the
memory-based Lookup, Merge Join, Aggregate, Sort, and Derived Column transformations. This redesign reduces the synchronous
steps involved in the prior architecture, and
because of the memory-based architecture
of the SSIS pipeline, also reduces the disk
I/O. Overall, this redesigned sample package
runs about 40 percent faster than the
migrated package. Also compelling is that
when you track the Physical Disk counters,
the redesigned package uses 50 percent
less disk I/O, and in fact also reduces the
average processor utilization from
35 percent to 25 percent, allowing
more headroom for other processes and packages to run on the
same server.
So, are data-staging tasks no
longer required? In many cases,
staging data isn’t necessary, but
a few situations merit their use.
For example, staging can help fulfill data-validation requirements,
provide a point-in-time capture
of a volatile source, or help align
sources in which the extraction
times don’t overlap. SQL Server
can also provide valuable functionality with set-based updates or
deletes, recursive operations, and
ETL auditing and administrative
tasks. When deciding on staging
data in your migrated packages,
ask, “Which staging or SQL
Server steps were used because
of the DTS based architecture?”
and “For these steps, how can I
leverage the SSIS Data Flow?” It’s
true, however, that the net effect
will be a reduction in staging and sequential SQL Server tasks and lead to an
overall increase in server scalability and a
reduction in processing times.
Prev. page
1
[2]
3
next page