• subscribe
September 29, 2008 12:00 AM

SSIS Logging and Data Auditing

Set up a logging environment using Business Intelligence Development Studio
SQL Server Pro
InstantDoc ID #100005
Downloads
100005.zip

Executive Summary: Learn how to implement Microsoft SQL Server Integration Services (SSIS) package logging and data auditing in a production environment. Use Business Intelligence Development Studio (BIDS) to set up a logging environment for managing, monitoring, and troubleshooting packages. This article reviews SSIS’s built-in logging capabilities and shows you how to customize them. You’ll see how to incorporate basic data auditing in the package data flows. Examples include package templates, along with a database and reports (for storing and viewing package execution results), which you can use as a starting point in your deployments.

SQL Server Integration Services (SSIS) provides organizations with powerful data integration capabilities. To effectively monitor SSIS performance, troubleshoot errors, and understand data lineage in a production environment, you need to implement package logging and data auditing. I’ll show you how to use Business Intelligence Development Studio (BIDS) to set up a logging environment for managing, monitoring, and troubleshooting packages. I review SSIS’s built-in logging capabilities and then show you how to customize them. You’ll see how to incorporate basic data auditing in the package data flows. The examples I use include package templates along with a logging database and reports (for storing and viewing package execution results), which you can use as a starting point in your deployments.

The logging database and many of the SSIS tasks I use come directly from Project REAL. Project REAL is a reference architecture and implementation of an enterprise-class business intelligence (BI) solution based on SQL Server 2005. Microsoft and several partners used actual customer data to create this reference implementation. To learn more about Project REAL, see the Learning Path.

To download the sample code used in this article, click the 100005.zip hotlink. The .zip file contains four packages, two of which I discuss in this article, and two for your future use. I use the 1_ETL_Template.dtsx package to show you how to add a log provider to a package and then augment the log provider. I use the 2_ETL_Template.dtsx package to give you some Data Flow task tips.

Configuring Package Logging
Package logging refers to writing information about the execution results of a package, and its tasks. During package execution, SSIS can log many types of events (e.g., status, completion, error information about a task) to a variety of destinations, known as log providers. A log provider can be a text file, the SQL Server Profiler, a SQL Server relational database, a Windows event log, or an XML file. If necessary, you can define a custom log provider (e.g., a proprietary file format).

LEARNING PATH
SQL SERVER MAGAZINE RESOURCES
For information about how to develop with BIDS:
“Building Better BI in SQL Server 2005,” InstantDoc ID 44816
“SQL Server 2008 Business Intelligence Enhancements,” InstantDoc ID 98467

For more information about using SSIS:
“Scripting Control Flow in SQL Server Integration Services,” InstantDoc ID 94584
“Debugging and Logging in SSIS,” InstantDoc ID 98167
“SQL Server Integration Services,” InstantDoc ID 95385

MICROSOFT RESOURCES
New Business Intelligence Development Studio, msdn.microsoft.com/en-us/library/ms170338.aspx
Project REAL—Business Intelligence in Practice, www.microsoft.com/sql/solutions/bi/projectreal.mspx
Project REAL: Business Intelligence ETL Design Practices, www.microsoft.com/technet/prodtechnol/sql/2005/realetldp.mspx

You configure logging at the package level by adding a log provider. You can choose to log only certain tasks within a package. When designing a package in BIDS, click Logging on the SSIS menu to display the Configure SSIS Logs dialog box. Use the Providers and Logs tab to select a provider to use for the container, as shown in Figure 1. When using the SSIS log provider for SQL Server, the Configuration column is where you specify a connection to a SQL Server database. All the log information is written to a table called sysdtslog90, which SSIS automatically creates in the database.

When you’ve added a log provider to your package, click the Details tab and select the events you want to log. Some of the event choices include OnError (which writes a log entry when an error occurs), OnPost Execute (which writes a log entry after a task or package finishes running), and OnVariableValueChanged (which logs an entry when a variable value changes). To prevent the log from becoming unnecessarily large and to avoid degrading performance, log only the events you need.

Opening the Logging Examples
Now let’s put the sample code for this article to use. Download it, if you haven’t already; then, open the LogAuditTemplates.sln file in BIDS from File, Open, Project Solution. This solution has two projects: LogAuditTemplates and LogAuditReports. In Solution Explorer, expand the LogAuditTemplates project and the Data Sources and SSIS Packages folders. Double-click the 1_ETL_Template.dtsx package, which you can see in Figure 2, to open it in the package designer. 1_ETL_Template.dtsx is a basic package that you can use as a template for building other packages.

SSIS Log Provider or SQL Server
The 1_ETL_Template.dtsx package has been configured to use the SSIS log provider for SQL Server and to store the log results (the sysdtslog90 table) in a database called ETL_LogAuditDB. Before you run the package, restore the database to an instance of SQL Server; you’ll find a zipped backup of the database in the file ETL_LogAuditDB.zip, which is included with the sample code. If you decide to restore the database to something other than a default instance on your local machine, you’ll need to update the connection information for the package’s ETL_LogAuditDB.ds and ETL_LogAuditDB OLEDb.ds data sources.

The log provider for the 1_ETL_Template.dtsx package is configured to log the OnError and OnPostExecute events. Limiting logging to these two events keeps the logged information to a minimum, yet allows you to calculate performance data for each task in a package and capture any errors that might occur.

Augmenting the Built-In Provider
As Figure 2 shows, the 1_ETL_Template.dtsx package contains two Control Flow tasks: SQL Audit OnPreExecute and SQL Audit OnPostExecute. These two tasks augment the built-in SSIS log provider by populating a row in a table named ExecutionLog. The ExecutionLog table serves as a header for all the detail rows the SSIS log provider writes to the sysdtslog90 table. The sysdtslog90 and the ExecutionLog tables are linked together by a field named executionid, which is a GUID generated by SSIS each time a package is run.

The ExecutionLog table provides a concise summary of package execution results (e.g., start time, end time, overall status) in a single row, along with other useful information. For example, the table stores package version information, which can be useful if troubleshooting problems resulting from a new version of an existing package.

The SQL Audit OnPreExecute and SQL Audit OnPostExecute Control Flow tasks are both Execute SQL tasks that call stored procedures in the ETL_LogAuditDB database. Most of the data passed to the stored procedures (e.g., PackageName, PackageID) comes from SSIS system variables. The sample uses the following three variables, as you can see in the left-hand pane in Figure 2.

• The LogicalDate variable lets you define a “logical” processing date, which might differ from the actual package execution date. For example, in an extraction, transformation, and loading (ETL) process you might need to load data from a specific time period (e.g., all of the new sales entries from a given fiscal period) or you might need to perform a net data capture based on the last successful execution date of the package. In these scenarios, a date value needs to be stored and then read/updated by the package during execution.

• LogID is an integer that uniquely identifies a package execution and is generated via an identity column in the ExecutionLog table.

• ParentLogID is simply the LogID of a “parent” package. In other words, if the package is invoked from another package, the ParentLogID value will be equal to the LogID of the calling (i.e., the parent) package. To pass the LogID from the parent package, you use a package configuration. To view the package configuration, select Package Configurations from the SSIS menu.

Continue to page 2



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here