Many analytical tools integrate with Microsoft Excel, and for a simple reason: People who crunch numbers for a living spend a lot of time with spreadsheets. SQL Server 2005's data-mining algorithms offer powerful capabilities for analyzing data, but Microsoft didn't provide any out-of-the box tools for delivering data mining to the desktop until recently. In February 2007, Microsoft delivered the Data Mining Add-ins for Office 2007, which let you take advantage of SQL Server 2005 predictive analytics in Excel 2007 and Microsoft Office Visio 2007. Let's walk through the installation and configuration of the Data Mining Add-ins and look at a comprehensive sample. We'll focus on one add-in tool in particular: the Data Mining Client for Excel.

Installation and Initial Configuration
Before installing the add-ins, make sure Excel 2007 and the Microsoft .NET Framework 2.0 are installed on your machine. Additionally, run Office Setup from the Control Panel Add/Remove Programs applet to ensure that the .NET Programmability Support component is installed under Excel. (This component is included with the default installation of Excel.) Lastly, you'll need access to an instance of SQL Server 2005 Analysis Services, though this instance doesn't have to reside on your machine. I recommend that you also install the AdventureWorks Analysis Services database sample. (The Microsoft article "Running Setup to Install AdventureWorks Sample Databases and Samples," http://msdn2.microsoft.com/en-us/library/ms143804.aspx, explains how to install the sample.)

Next, download and install the Data Mining Add-ins. (For download details, go to http://go.microsoft.com/fwlink/?LinkId=82754.) In the Feature Selection window, which you'll see after you begin the installation process on your local machine, make sure the following features, at a minimum, are set to be installed on the local hard drive:

  • Data Mining Client for Excel, which lets you run through the full datamining process within Excel 2007
  • Server Configuration Utility, which lets you set Analysis Services server properties specific to data mining and create a database for storing models.

After the installation is complete, click Start, All Programs, SQL Server 2005 DM Add-Ins, Server Configuration Utility.The configuration utility sets the Analysis Services Data Mining\AllowSessionMiningModels server property to True, then creates the DMAddinsDB database to use for creating temporary and permanent data models. You can also view the documentation that comes with the Data Mining Add-ins if you prefer to configure your server and create a database manually.

A Data-Mining Primer
At a high level, data mining is the process of finding information (e.g., patterns, trends) in large volumes of data in an automated or semi-automated manner. The classic example of the value of data mining is its use in cross selling. For example, when I buy a book online, the vendor might query its database for customers who bought the same book and display other titles those customers purchased.

SQL Server 2005 ships with nine data-mining algorithms, each of which is suited to a different type of problem, such as forecasting sales, identifying fraudulent transactions and claims, or segmenting customers into different profiles. Each situation involves the use of historical data to build a model to predict a future state or to better understand the current state. Data-mining literature (including SQL Server Books Online—BOL) typically groups algorithms into different types (or tasks). An example of a data-mining type is Classification, which categorizes historical data around a predictable attribute. The Microsoft Decision Trees algorithm is a classification algorithm.

So, data mining helps solve problems, but how exactly does one mine data? Using BOL as a reference, let's look at the data-mining process in terms of six basic steps:

  1. Define the business problem.
  2. Prepare the historical data.
  3. Explore/validate the historical data.
  4. Build the data-mining model.
  5. Explore/validate the model.
  6. Deploy and update model.

The example I use is similar to Lesson 2: Building a Targeted Mailing Scenario in the data-mining tutorial that ships with SQL Server 2005. (For more information about the tutorial, see SQL Server 2005 BOL "Data Mining Tutorial," http://msdn2.microsoft.com/en-us/library/ms167167.aspx.) Assuming you've installed the AdventureWorks Analysis Services database sample, you can review the BOL datamining tutorial and the accompanying data-mining models for additional insight.

A Data-Mining Scenario
A typical data-mining scenario you might encounter is that of classifying customers so you can build a targeted mailing list. Let's walk through this scenario using the BOL six-step data-mining process as a guide.

Define the business problem. Let's say you're working in the marketing department of a famous bicycle company, AdventureWorks. The marketing director has decided to run a promotion for a new line of bicycles. You've acquired a long list of potential customers that includes key demographic data such as age, gender, marital status, and occupation. However, the marketing budget is tight. You need to minimize the cost of a direct mail campaign by identifying the set of potential customers most likely to purchase a bike, as indicated by customer purchase history. To identify these customers, you'll use the Data Mining Client for Excel.

Prepare the historical data. In this example, the task of preparing historical data has already been done for you. Click Start, All Programs, Microsoft SQL Server 2005 DM Add-ins, Sample Excel Data, to open the sample workbook named DMAddins_SampleData.xlsx. (You might want to make a backup copy of this workbook, which is located by default in the C:\Program Files\Microsoft SQL Server 2005 DM Add-Ins folder.) Navigate to the Source Data worksheet to view the demographic data about the existing customers, including a field indicating whether they've purchased a bike.

Explore and validate the historical data. To explore the historical data, you'll need to understand the various data attributes, group data into buckets to reduce complexity, look for outliers (i.e., problematic data values that are far outside the expected range and that might adversely affect or skew a model), and potentially change the data. Click the Data Mining tab at the top of the Excel Ribbon. Click Explore Data to open the Explore Data Wizard.

   Prev. page   [1] 2 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE