• subscribe
September 29, 2008 12:00 AM

SSIS Logging and Data Auditing

Set up a logging environment using Business Intelligence Development Studio
SQL Server Pro
InstantDoc ID #100005
Downloads
100005.zip

Error Handling
As I mentioned, the SSIS log provider for the 1_ETL_Template.dtsx package is configured to log OnError events. Additionally, the package contains a package-level OnError event handler consisting of an Execute SQL task named SQL OnError, which updates the ExecutionLog table if an error occurs.

Viewing Package Results
From the Solution Explorer in BIDS, right-click the 1_ETL_Template.dtsx package and select Execute Package. After the package runs, you can use SQL Server Reporting Services (SSRS) to view the logged events.

In Solution Explorer, expand the LogAuditReports project and the Reports folder. Double-click the PackageExecutionDetails.rdl report to display Report Designer. Click the Preview tab, select a Package Name and Execution Run value, and then click View Report. The report will look similar to the Package—Execution Details report in Figure 3. This report shows you when the package ran, who initiated the package and from what computer, the total package runtime, and the time it took to complete each task. If an error occurs, information about the error will also be listed in the message column.

Data Flow Performance and Data Auditing
The second package in the sample download, 2_ETL_Template.dtsx, incorporates performance tracking and data auditing (sometimes referred to as data lineage) into a Data Flow task. Open the package in the package designer and click the Data Flow tab. In this example, shown in Figure 4, I select a set of records from a source system (in this case, the AdventureWorks sample database) and then eventually output the records to a Row Count component. Let’s examine the two components in the middle, STAT Source and DER Audit.

STAT Source is a script component that measures the number of rows flowing through a Data Flow task, along with the minimum, maximum, and mean throughput per second. The results are stored in the StatisticLog table. You can include multiple STAT Source components in a Data Flow (be sure to give each component a unique name) to measure throughput at multiple points in the flow. In the LogAuditReports project, I’ve included another report, PackageProgressByExecutionTime.rdl, that provides a summary of package execution results by date and includes data from the StatisticLog table for packages that incorporate at least one STAT Source component.

DER Audit is a derived column component that adds two important pieces of information into the Data Flow—LogID and LastModifiedDate. This information is meant to be written to your destination source(s) so you have a way of mapping back to the package responsible for writing the data. If you use another means of inserting or updating data in a destination system, such as Execute SQL tasks, make sure to add these derived columns.

Next Steps
I’ve included two more packages for you to examine on your own. 3_ETL_MP_Template.dtsx is an example of a parent package that calls a child package. 4_ETL_Package-WithError.dtsx includes an intentional error in the data flow so you can see what an error looks like in the sample reports.

How do you use these packages in your environment? You can make a copy of the LogAuditTemplates project (updating/adding Data Sources as needed) and go from there. Or, you can add one or more of the individual packages to your existing SSIS projects and customize as needed. Note that each package contains a property named ID, which is a GUID that uniquely identifies the package. When you copy a package, this ID doesn’t change. To cleanly separate data in the ETL_LogAuditDB database, you should not only rename the package but also generate a new GUID. To generate a new GUID in BIDS, select the drop-down arrow in the ID column of the Properties pane of the Package Designer and select Generate New ID.

Finally, there are a few components in the ETL_LogAuditDB database I didn’t cover. You can use the ExecutionLog table to track performance of SQL statements run in a destination system. You can use the ProcessLog table to manage partitioned tables. The Project REAL ETL white paper discusses these components in greater detail.

To recap, I’ve reviewed SSIS’s built-in logging capabilities and then shown you how to augment them. I also incorporated performance tracking and data lineage into the package templates for auditing/troubleshooting purposes. These templates, along with the ETL_LogAuditDB database and sample SSRS reports, can serve as a starting point for ETL solutions you deploy in the future.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here