Test Results for Non-Nested Cases
In most situations, data-mining analysts gather source data from an online transaction processing (OLTP) system. After you cleanse and transform the data, you can store the training data in a case table. For example, in Scenario 1 of Table 1, an analyst might want to predict customer churn risk based solely on customer demographic information. Figure 1 shows the Customer table, which is the data-source table we used for non-nested cases. This case table contains the customer demographic information. Using the MDT algorithm, we conducted four experiments to measure the model-training performance of a non-nested data-mining model.
Number of input attributes. In this experiment, the sample size of training cases was 1 million, each input attribute had 25 different states, and we selected 1 predictable attribute. We studied the MDT training performance by varying the number of input attributes from 10 to 200. Graph 1 shows that the algorithm training time scales better than linearly when the number of input attributes increases from 10 to 100, but performance begins to decrease at more than 100 input attributes. Training for 50 input attributes took 31 minutes 30 seconds; for 100, it took less than 41 minutes. However, when we increased input attributes to 200, the training time increased to almost 130 minutesmore than three times the 100-attribute level. However, the training performance for 200 input cases is still reasonable, considering the number of cases.
Case sample size. In the second experiment, we varied the sample size from 10,000 to 10 million. We kept the other factors fixed, using a case table with 20 input attributes, 25 states per input attribute, and 1 predictable attribute. As Graph 2, page 55, shows, the training times scaled better than linearly from 10,000 to 5 million cases, but above that number, the relative performance began to decrease. For 1 million cases, training the model took 11 minutes 20 seconds; for 5 million, it took less than 35 minutes; but for 10 million, the time was more than 100 minutes, three times the amount for 5 million cases.
Number of states per input attribute. For this experiment, we fixed the sample size at 1 million cases and input attributes at 20, with 1 predictable attribute. Then, we varied the number of input attribute states from 2 to 50. Graph 3 shows that training times again scaled better than linearly when the tested factor was less than a certain valuein this case, 10. But here, something peculiar happens. When the number of states increased above 10, the training time began to drop rapidly.
What caused the sudden drop-off? We believe that the MDT algorithm has trouble finding useful splits for tree growth above a certain number of input attribute states, resulting in a reduced tree depth and reduced training time. The training-time decrease might be good for performance, but it might also affect the prediction accuracy of the resulting tree because the nodes (classes) in the resulting tree might be broader than expected. When the nodes of a decision tree are too broad, its ability to accurately predict cases is decreased. Consequently, we recommend that you limit the maximum number of input attribute states to 20.
Number of predictable attributes. Here, we varied the number of predictable attributes while maintaining the other factors at 1 million cases, 40 input attributes, and 25 attribute states. Graph 4 shows the resulting training performance as we increased the number of predictable attributes from 1 to 32. Examples of predictable attributes would be whether a customer is going to leave or what the credit risk is for a particular customer if the bank issues a new loan to that customer. Because in this scenario the algorithm needs only one data loop for a prediction, the training time is linear when the number of predictable attributes is less than four (i.e., the number of CPUs in our test server). However, when the number of predictable attributes increases to 16, the training time increases by a factor of almost 10 to 5 hours 24 minutes. The algorithm builds a tree for each predictable attribute; this multiple tree-building process happens in parallel. Training time begins to increase as the number of trees increases, possibly because the overhead of switching between trees during the training process also increases. We recommend that you use data cleansing and normalization to reduce the number of predictable attributes. For nested tables, data cleansing and normalization ensures that the predictable attributes are free of irrelevant and dirty data and minimizes the redundancy of values in the domains of the predictable attributes.
Test Results for Nested Cases
The nested table is a new concept introduced in OLE DB for Data Mining. Using nested tables makes data-mining models more powerful. For example, from Scenario 3 of Table 1, we built a model with a nested table to predict a list of banking products that customers might be interested in, based on their demographic information and a list of the banking products they purchased before. Without using a nested table, analyzing this data-mining problem is difficult unless you denormalize the table. For all nested-case experiments, we built models based on two tables: the main case table (Customer), containing the customer demographic information, and the nested table (Purchases), containing the list of unique banking products that the customers purchased. Figure 2 shows the data-source tables we used for our three nested-case experiments.
Prev. page
1
[2]
3
next page