Building and Viewing Profile Output
To build the profiler’s output, simply execute the Data
Profiling task. If you designated a file destination, you
can view the profiler’s output by using a tool called
Data Profile Viewer (DPV) that ships with SQL Server
2008 Community Technology Preview 5 (CTP5). For
now, DPV is a standalone executable that analyzes the
output of the Data Profiling task. BIDS will probably
include support for launching the tool in the final
release. DPV doesn’t automatically appear on the Windows
Start menu as of CTP5, so I recommend creating
a shortcut to the executable. The file is located at C: Program Files\Microsoft SQL Server\100\DTS\Binn
and is called DataProfileViewer.exe.
After you launch DPV, you can open a profiler
output .xml file by clicking Open, the only command on
the menu bar, and navigating to the desired file. This file’s
content will have been populated from a prior execution
of a Data Profiling task. Figure 3 depicts DPV with
output from a Column Value Distribution profile.
As you can see in Figure 3, the DPV utility has two
main sections. You use the left pane to choose which
profile to review. The executed profiles are arranged by
the type of database objects they were applied against.
The right side of the DPV window shows you the details of the selected profile. The output of each of the various
profiles is unique to that profile, so you’ll see different
output in the details pane for each profile you select.
The DataProfile.xsd Schema
The Data Profiling task output adheres to the Data-
Profile.xsd schema, which you can see at schemas
.microsoft.com/sqlserver/2008/DataDebugger/Data
Profile.xsd. The schema has three core sections: data
sources, profile input, and profile output. The data
sources section contains all the data sources used by the
Data Profiling task. The profile input section is where
each selected profile is stored, along with the tables,
views, and columns it’s applied to. The profile output
section contains the results of each profile selected.
A Data Profiling Example
You now have a solid foundation in SQL Server
2008 data profiling. Let’s run through a sample of
the feature in a test environment. Suppose that the
AdventureWorks Company has charged you with
building a new formal data warehouse system to support
organizational decision making. As one of your
first steps, you’ll need to assess the data quality of the
AdventureWorks OLTP database. In this example,
you’ll assess the data quality of the Production.Product
table. Note that the AdventureWorks sample databases
are no longer included as part of the SQL Server setup;
the SQL Server sample databases have been relocated
to the Microsoft CodePlex portal (www.codeplex.com/
MSFTDBProdSamples).
First, create a new SSIS package and drag an
instance of the Data Profiling task onto the Control
Flow designer. Next, create the ADO.NET connection
manager to a local AdventureWorks OLTP database.
Now configure the Data Profiling task. Because you’re
performing an ad-hoc analysis, send the output to a local XML file called AW_ProductTableAnalysis.xml
on the desktop.
Now you need to select individual profiles to be run
against the Production.Product table. Use the quick
profile option (you can assume the default settings for
these samples). I don’t explicitly walk you though the
Quick Profile dialog box, which Figure 4 shows, but it’s
quite simple.
The Quick Profile dialog box configures the profiles
to use all columns for the selected table or view. To
review, click OK in the Quick Profile dialog box, then
select Profile Requests in the left pane of the tasks
editor. You can browse through the various profiles
and review the default settings the Quick Profile process
assigns. Click OK to complete the Data Profiling
task configuration. You can’t select a Value Inclusion
profile with the Quick Profile option; you need to use
the Profile Requests section of the editor.
Now execute the package by pressing F5. The
Data Profiling task turns green and a file containing
the output from the profile operation is created on the
desktop. Finally, use DPV to review the output of the
Data Profiling task. First, launch the DPV, then click
Open and select the new AW_ProductTableAnalysis
.xml file. Figure 5 shows the task’s output in the
viewer without any individual profile selected. Notice
that because you only ran column-level profiles, you
must navigate to an individual column before you see any of the profiles you ran in the results pane.
You can now see the length, values, NULL ratio, and
statistics for the various columns in the Production.
Product table!
A Long-Overdue Feature
I don’t know if Microsoft and the Kimball Group (www.kimballgroup.com) have some form of partnership or not; however, the SQL Server BI stack continues to
incorporate more and more concepts from Kimball’s
formalized data warehouse lifecycle, and data profiling
is a good example. Data profiling is a formal part of
Kimball’s data warehousing methodology, and beyond
Kimball, it’s a fairly common task for a data warehouse
developer.
Data profiling was a much overdue feature in the
SQL Server BI stack. For a first build, the feature is
fairly impressive. Probably the biggest complaint will be that it can currently run only on SQL Server 2000
or later, but I expect this situation to change over time.
In addition, I expect to see the ability to build your
own custom profiles and plug them into the larger
data-profiling framework. SQL Server 2008 has several
key BI enhancements, and the SSIS data-profiling
capability is definitely one such feature.
End of Article
Prev. page
1
[2]
next page -->