Building Analysis Services Data-Mining Models
So, how can you use Analysis Services to build data-mining models that give you vital information about your business? Let's walk through some examples to see how you can train and browse data-mining models and make predictions based on the trained models. Table 2 lists four banking scenarios along with the algorithm best suited for each scenario.

To solve the business problems, we use two relational database tables: Customer and Purchases. The Customer table contains demographic information about bank customers. Demographic information includes the customer's age, income, education level, house value, loans, and so on. The Purchases table contains purchase information about the bank products that a customer subscribes to. This table has information about bank products such as checking, money market, and savings accounts. As Figure 3 shows, the Customer table is linked to the Purchases table by CustomerID. In relational terminology, the Purchases table makes a foreign key reference to the Customer table.

Now let's look at various steps for creating, training, and browsing the data-mining model. The first scenario: Identify those customers who are most likely to churn (leave), based on customer demographic information. Then, we'll look at how to solve the fourth scenario by using the clustering algorithm.

Creating Data-Mining Models
When you create a data-mining model, you need to define the model's structure and properties. According to the OLE DB for Data Mining API, you create a new data-mining model by using the CREATE MINING MODEL statement. In relational databases, the CREATE TABLE statement defines a relational table's structure and properties, including column names and data types, constraints, and storage options. Similarly, the CREATE MINING MODEL statement defines a model's keys, columns, and the specific algorithm and parameters to use in the training operation.

You create a data-mining model by using the Mining Model Wizard in Analysis Manager. After you select the data-mining algorithm, define the input table, and specify the input and predictable columns, the Mining Model Wiz-ard automatically generates the CREATE MINING MODEL statements. We used the MDT algorithm to create a data-mining model to solve the first problem in Table 2 because it's a prediction problem. We chose CustomerID as the case key column, which is the case table's identity key column and uniquely identifies a case. As Figure 4 shows, we selected all the demographic information as input columns and the Churn_Yes_No column as the output attribute we wanted to predict. In the wizard's final step, we named this model Model1_MDT_NonNested.

After you click Finish, the wizard generates a CREATE MINING MODEL statement based on OLE DB for Data Mining syntax, then sends it to the Microsoft Data Mining Provider. The generated CREATE MINING MODEL syntax is similar to standard SQL's CREATE TABLE statement. In Figure 5, for example, the keywords LONG, DOUBLE, and TEXT define the columns' data types and are similar to T-SQL's int, float, and varchar. However, the statement has a few extensions that aren't part of T-SQL. For example, the keyword KEY designates a Key content type column (or columns), which uniquely identifies a row in the data-mining model. The keywords CONTINUOUS and DISCRETE are two possible values for Attribute content type, specifying a continuous or discrete column type. The keyword PREDICT designates the data-mining model's predictable column, which is the target column that the user wants to find patterns about.

Processing (Training) Data-Mining Models
After you create the data-mining model, the next step (the last step in the wizard) is training the model. To train the model, you use training data to populate the mining model. The CREATE MINING MODEL statement creates only the model's structure. Training adds the model's contents. Training is usually the most time-consuming step in the data-mining process. The algorithm might have to iterate over the training data set several times to find the hidden patterns.

The OLE DB for Data Mining API uses the INSERT statement to specify the training command. This use of the INSERT statement is the same as the standard SQL INSERT, which populates a table with data. Although you're feeding massive quantities of data into the data-mining model, the model doesn't store any of the data; instead, it stores the patterns that it finds within the data. After the model is trained, the client application can browse its content and perform queries on the new data set.

Figure 6 shows the training process statements for the customer-churning example. From the wizard's View Trace Line window, you can see that table SONISLAP.NEWDMDB.CUSTOMER contains the training data. The columns CustomerID, Income, OtherIncome, Loan, and so on are used for training. (A complete discussion of the training process is beyond the scope of this article; we'll concentrate on training in a future article.)

Browsing Mining-Model Contents
After training the model, you can browse the model's content through Analysis Manager's tree browser. This browser displays the content graphically, letting you navigate through different portions of the model. Studying the contents can give analysts important insight into the data and help them understand the patterns and rules within it. Later, the analyst can apply these rules to new data sets to make predictions.

The tree that Figure 7 shows represents the customer-churning pattern that the MDT algorithm found in the training data set. The algorithm found that income is the most significant attribute for predicting whether a customer will churn: People with lower incomes have a higher churn probability. The algorithm divides the customers' income into four branches. The decision-tree algorithm then chooses age as the next most significant predictor. At the third level of prediction, the algorithm selects education level for customers with less than $49,923.75 in income and house value for those with incomes between $100,040.25 and $124,517.25. Based on this information, the bank can predict the probability of churn for each customer.

Prev. page     1 [2] 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE