Building and Viewing Profile Output
To build the profiler’s output, simply execute the Data Profiling task. If you designated a file destination, you can view the profiler’s output by using a tool called Data Profile Viewer (DPV) that ships with SQL Server 2008 Community Technology Preview 5 (CTP5). For now, DPV is a standalone executable that analyzes the output of the Data Profiling task. BIDS will probably include support for launching the tool in the final release. DPV doesn’t automatically appear on the Windows Start menu as of CTP5, so I recommend creating a shortcut to the executable. The file is located at C: Program Files\Microsoft SQL Server\100\DTS\Binn and is called DataProfileViewer.exe.

After you launch DPV, you can open a profiler output .xml file by clicking Open, the only command on the menu bar, and navigating to the desired file. This file’s content will have been populated from a prior execution of a Data Profiling task. Figure 3 depicts DPV with output from a Column Value Distribution profile.

As you can see in Figure 3, the DPV utility has two main sections. You use the left pane to choose which profile to review. The executed profiles are arranged by the type of database objects they were applied against. The right side of the DPV window shows you the details of the selected profile. The output of each of the various profiles is unique to that profile, so you’ll see different output in the details pane for each profile you select.

The DataProfile.xsd Schema
The Data Profiling task output adheres to the Data- Profile.xsd schema, which you can see at schemas .microsoft.com/sqlserver/2008/DataDebugger/Data Profile.xsd. The schema has three core sections: data sources, profile input, and profile output. The data sources section contains all the data sources used by the Data Profiling task. The profile input section is where each selected profile is stored, along with the tables, views, and columns it’s applied to. The profile output section contains the results of each profile selected.

A Data Profiling Example
You now have a solid foundation in SQL Server 2008 data profiling. Let’s run through a sample of the feature in a test environment. Suppose that the AdventureWorks Company has charged you with building a new formal data warehouse system to support organizational decision making. As one of your first steps, you’ll need to assess the data quality of the AdventureWorks OLTP database. In this example, you’ll assess the data quality of the Production.Product table. Note that the AdventureWorks sample databases are no longer included as part of the SQL Server setup; the SQL Server sample databases have been relocated to the Microsoft CodePlex portal (www.codeplex.com/ MSFTDBProdSamples).

First, create a new SSIS package and drag an instance of the Data Profiling task onto the Control Flow designer. Next, create the ADO.NET connection manager to a local AdventureWorks OLTP database. Now configure the Data Profiling task. Because you’re performing an ad-hoc analysis, send the output to a local XML file called AW_ProductTableAnalysis.xml on the desktop.

Now you need to select individual profiles to be run against the Production.Product table. Use the quick profile option (you can assume the default settings for these samples). I don’t explicitly walk you though the Quick Profile dialog box, which Figure 4 shows, but it’s quite simple.

The Quick Profile dialog box configures the profiles to use all columns for the selected table or view. To review, click OK in the Quick Profile dialog box, then select Profile Requests in the left pane of the tasks editor. You can browse through the various profiles and review the default settings the Quick Profile process assigns. Click OK to complete the Data Profiling task configuration. You can’t select a Value Inclusion profile with the Quick Profile option; you need to use the Profile Requests section of the editor.

Now execute the package by pressing F5. The Data Profiling task turns green and a file containing the output from the profile operation is created on the desktop. Finally, use DPV to review the output of the Data Profiling task. First, launch the DPV, then click Open and select the new AW_ProductTableAnalysis .xml file. Figure 5 shows the task’s output in the viewer without any individual profile selected. Notice that because you only ran column-level profiles, you must navigate to an individual column before you see any of the profiles you ran in the results pane. You can now see the length, values, NULL ratio, and statistics for the various columns in the Production. Product table!

A Long-Overdue Feature
I don’t know if Microsoft and the Kimball Group (www.kimballgroup.com) have some form of partnership or not; however, the SQL Server BI stack continues to incorporate more and more concepts from Kimball’s formalized data warehouse lifecycle, and data profiling is a good example. Data profiling is a formal part of Kimball’s data warehousing methodology, and beyond Kimball, it’s a fairly common task for a data warehouse developer.

Data profiling was a much overdue feature in the SQL Server BI stack. For a first build, the feature is fairly impressive. Probably the biggest complaint will be that it can currently run only on SQL Server 2000 or later, but I expect this situation to change over time. In addition, I expect to see the ability to build your own custom profiles and plug them into the larger data-profiling framework. SQL Server 2008 has several key BI enhancements, and the SSIS data-profiling capability is definitely one such feature.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Is there a way to profile all tables in a given database rather than setting up a connection for each table one by one ?

rbailey121

Article Rating 3 out of 5

Hi rbailey,

You can use a wildcard character '*' for the column definition to for a given profile to scan all applicable data typed columns in a given table. You cannot use such a wildcard character for the table|view selection. What you could do however is dynamically generate the a given profile XML input. The ProfileInputXML is a property of the data profiling task which is a writeable property using the expressions tab of the task's editor.

Best Wishes Derek

dcomingore

Article Rating 5 out of 5

 
 

ADS BY GOOGLE