Prediction Using Mining Models
You can use the trained mining model to make predictions about new data setsfor example, to predict the probability of potential churn for each new customer. And a new SQL Server 2000 Data Transformation Services (DTS) task called Prediction can help you create a DTS package for prediction.
Using the OLE DB for Data Mining API, prediction statements are SELECT statements that join a data-mining model with a new input table. This special kind of join is a PREDICTION JOIN. The general syntax for prediction is
SELECT [FLATTENED] <SELECT-expressions>
FROM <mining model name> PREDICTION JOIN
<source data query> ON <join condition>
[WHERE <WHERE-expression>]
This statement is very similar to a relational join with two tables, except that one of the input "tables" is an already trained mining model.
Using the Clustering Algorithm with a Nested Table
We've seen how the MDT algorithm works; now let's look at how to use Microsoft Clustering in data mining. To solve Table 2's problem of grouping customers according to the banking products they subscribe to, we used a nested table to build a clustering model. You need a nested table to model this business problem because the products that a customer has purchased are nested under that customer (i.e., one customer case includes a set of products). After starting the Mining Model Wizard, we selected the Customer and Purchases tables in the relational database, then we selected the Microsoft Clustering algorithm. We chose CustomerID as the case key column from the Customer table, then selected all the demographic information as input columns from Customer and all the purchase information as input from the Purchases table. After creating the model, we edited it in the Relational Mining Model Editor. For example, the wizard assumed that Purchases is a lookup table, so it initially made the Customer table the "many" side of the relationship. However, we wanted to use Purchases as a nested table, so we clicked the link between these tables in the editor and reversed the relationship. Figure 8 shows the edited nested model. After we processed this data-mining model, using the INSERT INTO statement that the wizard generated, the trained model looked like the one that Figure 9 shows.
From browsing the model content in the cluster browser, we made the following observations with the help of the description in the node path section:
- Cluster 1 customers have the following characteristics: Most are around 50 years old, their average income is $79,000, they have no extra income source, and many of them have a high education level. They own many certificates of deposit, money-market or savings accounts, and so on.
- Cluster 2 includes customers who are, on average, 40 years old, have an average income of $56,000, and have an average additional income source of about $42,000. Customers in this cluster own a large number of certificates of deposit and savings accounts but a small number of credit cards.
- Cluster 3 includes customers whose average age is 65, who have an average income of $56,347, and who have an average additional income source of $42,645. Customers in this cluster own a large number of credit cards and money-market accounts.
Based on this information, the bank's marketing department can send appropriate mailings for different products to different clusters of customers. The response rate from these mailings will likely be much higher than if the bank used a randomly selected portion of the customer base.
The information in this article is by no means exhaustive; it's simply meant to provide an overview of data-mining concepts. By exploring the techniques that this article presents, you can begin to see how the data-mining features in SQL Server 2000 Analysis Services can help your organization find the patterns and rules to improve its marketing, sales, and customer-support operations and to gain a better understanding of its customers. Stay tuned for an upcoming article, in which we take a detailed look at the training performance of these algorithms.
End of Article
Prev. page
1
2
[3]
next page -->