Time Series
The Time Series algorithm introduces the concept of past, present, and future into the prediction business. This algorithm not only selects the best predictors for a prediction target but also identifies the most likely time periods during which you can expect to notice the effect of each predicting factor. For example, having built a model involving monthly primary economic indices, you might learn that the expected Yen-to-USD currency conversion rate today depends most strongly on the mortgage rate of 2 months ago and is related to the industrial production index of 7 months ago and per capita income of 6 to 7 months ago.
Figure 5 shows a data-mining control called Node Legend that gives a graphical view of these dependencies. The long left-side blue bar next to Mort30 Yr (-2) indicates a negative correlation between Yen to USD and the mortgage rate 2 months agomeaning that with time, as one value goes up, the other value goes down. The purple curve (for Yen to USD) and the yellow curve (for the mortgage rate) in Figure 6 offer a nice graphical representation of this opposing movement of rates. Smaller blue bars in Figure 5 indicate that the exchange rate is to some extent self-sustaining; indeed, they highlight the fact that the rate today correlates well with the Yen-to-USD rate a month ago (coefficient 0.656) and somewhat with the rate 2 months ago (coefficient -0.117). So, when refinancing to a lower rate, you might consider cashing out and investing in Yen-backed securitiesbut first, you need to look at the prediction variances (and of course keep mum about the entire scheme).
Clustering and Sequence Clustering
A new feature of Microsoft Clustering algorithms is their ability to find a good cluster count for your model based on the properties of the training data. The number of clusters should be manageably small, but a cluster model should have a reasonably high predictive power. You can request either of the clustering algorithms to pick a suitable cluster count based on a balance between these two objectives.
Microsoft Sequence Clustering is a new algorithm that you can think of as order-sensitive clustering. Often, the order of items in a data record doesn't matter (think of a shopping basket), but sometimes it's crucial (think of flights on an itinerary or letters in a DNA code). When data contains ordered sequences of items, the overall frequencies of these items don't matter as much as what each sequence starts and ends with, as well as all the transitions in between.
Our favorite example that shows the benefits of Sequence Clustering is the analysis of Web click-stream data. Figure 7 shows an example of a browsing graph of a certain group of visitors to a Web site. An arrow into a Web page node is labeled with the probability of a viewer transitioning to that Web page from the arrow's starting point. In the example cluster, news and home are the viewer's most likely starting pages (note the incoming arrow with a probability of 0.40 into the news node and the probability 0.37 arrow into the home node). There's a 62 percent probability that a news browser will still be browsing news at the next click (note the 0.62 probability arrow from the news node into itself), but the browsers starting at home are likely to jump to either local, sport, or weather. A transition graph such as the one in Figure 7 is the main component of each sequence cluster, plus a sequence cluster can contain everything an ordinary cluster would.
Naive Bayes Models and Neural Networks
These algorithms build two kinds of predictive models. The Microsoft Naïve Bayes (NB) algorithm is the quickest, although somewhat limited, method of sorting out relationships between data columns. It's based on the simplifying hypothesis that, when you evaluate column A as a predictor for target columns B1, B2, and so on, you can disregard dependencies between those target columns. Thus, in order to build an NB model, you only need to learn dependencies in each (predictor, target) pair. To do so, the Naïve Bayes algorithm computes a set of conditional probabilities, such as this one, drawn from census data:
Probability( Marital = "Single" |
Military = "On Active Duty" ) = 0.921
This formula shows that the probability of a person being single while on active duty is quite different from the overall, population-wide probability of being single (which is approximately 0.4), so you can conclude that military status is a good predictor of marital status.
The Neural Networks (NN) methodology is probably the oldest kind of prediction modeling and possibly the hardest to describe in a few words. Imagine that the values in the data columns you want to predict are outputs of a "black box" and the values in the potential predictor data columns are inputs to the same black box. Inside the box are several layers of virtual "neurons" that are connected to each other as well as to input and output wires.
The NN algorithm is designed to figure out what's inside the box, given the inputs and the corresponding outputs that are already recorded in your data tables. Once you've learned the internal structure from the data, you can predict the output values (i.e., values in target columns) when you have the input values.
Prev. page
1
[2]
3
next page