Many analytical tools integrate with Microsoft Excel, and for a simple reason: People
who crunch numbers for a living spend a lot of time with spreadsheets. SQL Server 2005's data-mining algorithms offer powerful capabilities for analyzing data, but
Microsoft didn't provide any out-of-the box tools for delivering data mining to the
desktop until recently. In February 2007, Microsoft delivered the Data Mining Add-ins
for Office 2007, which let you take advantage of SQL Server 2005 predictive analytics
in Excel 2007 and Microsoft Office Visio 2007. Let's walk through the installation and
configuration of the Data Mining Add-ins and look at a comprehensive sample. We'll
focus on one add-in tool in particular: the Data Mining Client for Excel.
Installation and Initial Configuration
Before installing the add-ins, make sure Excel 2007 and the Microsoft .NET Framework 2.0 are installed on your machine. Additionally, run Office Setup from the Control Panel Add/Remove Programs applet to ensure that the .NET Programmability
Support component is installed under Excel. (This component is included with the
default installation of Excel.) Lastly, you'll need access to an instance of SQL Server
2005 Analysis Services, though this instance doesn't have to reside on your machine. I
recommend that you also install the AdventureWorks Analysis Services database sample.
(The Microsoft article "Running Setup to Install AdventureWorks Sample Databases
and Samples," http://msdn2.microsoft.com/en-us/library/ms143804.aspx, explains
how to install the sample.)
Next, download and install the Data
Mining Add-ins. (For download details,
go to http://go.microsoft.com/fwlink/?LinkId=82754.) In the Feature Selection
window, which you'll see after you begin
the installation process on your local
machine, make sure the following features,
at a minimum, are set to be installed on
the local hard drive:
- Data Mining Client for Excel, which
lets you run through the full datamining process within Excel 2007
- Server Configuration Utility, which
lets you set Analysis Services server
properties specific to data mining
and create a database for storing
models.
After the installation is complete,
click Start, All Programs, SQL Server
2005 DM Add-Ins, Server Configuration
Utility.The configuration utility sets the
Analysis Services Data Mining\AllowSessionMiningModels server property
to True, then creates the DMAddinsDB
database to use for creating temporary
and permanent data models. You can also
view the documentation that comes with
the Data Mining Add-ins if you prefer to
configure your server and create a database manually.
A Data-Mining Primer
At a high level, data mining is the process
of finding information (e.g., patterns,
trends) in large volumes of data in an
automated or semi-automated manner.
The classic example of the value of data
mining is its use in cross selling. For
example, when I buy a book online,
the vendor might query its database for
customers who bought the same book
and display other titles those customers
purchased.
SQL Server 2005 ships with nine
data-mining algorithms, each of which is
suited to a different type of problem, such
as forecasting sales, identifying fraudulent
transactions and claims, or segmenting
customers into different profiles. Each
situation involves the use of historical data
to build a model to predict a future state
or to better understand the current state.
Data-mining literature (including SQL
Server Books Online—BOL) typically
groups algorithms into different types (or
tasks). An example of a data-mining type
is Classification, which categorizes historical data around a predictable attribute.
The Microsoft Decision Trees algorithm
is a classification algorithm.
So, data mining helps solve problems,
but how exactly does one mine data?
Using BOL as a reference, let's look at the
data-mining process in terms of six basic
steps:
- Define the business problem.
- Prepare the historical data.
- Explore/validate the historical data.
- Build the data-mining model.
- Explore/validate the model.
- Deploy and update model.
The example I use is similar to Lesson 2:
Building a Targeted Mailing Scenario in the
data-mining tutorial that ships with SQL
Server 2005. (For more information about
the tutorial, see SQL Server 2005 BOL
"Data Mining Tutorial," http://msdn2.microsoft.com/en-us/library/ms167167.aspx.) Assuming you've installed the
AdventureWorks Analysis Services database
sample, you can review the BOL datamining tutorial and the accompanying
data-mining models for additional insight.
A Data-Mining Scenario
A typical data-mining scenario you might
encounter is that of classifying customers so
you can build a targeted mailing list. Let's
walk through this scenario using the BOL
six-step data-mining process as a guide.
Define the business problem. Let's say
you're working in the marketing department of a famous bicycle company,
AdventureWorks. The marketing director
has decided to run a promotion for a new
line of bicycles. You've acquired a long list
of potential customers that includes key
demographic data such as age, gender,
marital status, and occupation. However,
the marketing budget is tight. You need to
minimize the cost of a direct mail campaign by identifying the set of potential
customers most likely to purchase a bike,
as indicated by customer purchase history.
To identify these customers, you'll use the
Data Mining Client for Excel.
Prepare the historical data. In this
example, the task of preparing historical
data has already been done for you. Click
Start, All Programs, Microsoft SQL Server
2005 DM Add-ins, Sample Excel Data,
to open the sample workbook named
DMAddins_SampleData.xlsx. (You might
want to make a backup copy of this workbook, which is located by default in the
C:\Program Files\Microsoft SQL Server
2005 DM Add-Ins folder.) Navigate to
the Source Data worksheet to view the
demographic data about the existing
customers, including a field indicating
whether they've purchased a bike.
Explore and validate the historical data.
To explore the historical data, you'll need to
understand the various data attributes, group
data into buckets to reduce complexity, look
for outliers (i.e., problematic data values that
are far outside the expected range and that
might adversely affect or skew a model), and
potentially change the data. Click the Data
Mining tab at the top of the Excel Ribbon.
Click Explore Data to open the Explore
Data Wizard.
Prev. page  
[1]
2
3
next page