Association Rules
The Association Rules algorithm is geared toward analyzing transactional data, also known as market-basket data. Its main use is for high-performance prediction in cross-sell data-mining applications. This algorithm operates in terms of itemsets. It takes in raw transaction records, such as the one that Figure 8 shows, and builds a sophisticated data structure for keeping track of counts of items (e.g., products) in the dataset. The algorithm creates groups of items (the itemsets) and gathers statistical counts for them. For Figure 8's tiny sample record, the statistics would look like Figure 9.
One of the most important parameters of a model is a threshold for excluding unpopular items and itemsets. This parameter is called the minimum support. In the preceding example, if you set the minimum support to 2, the only itemsets retained will be <Bread>, <Milk>, and <Bread, Milk>.
The result of the algorithm is the collection of itemsets and rules derived from the data. Each rule comes with a score called a lift score and a certain support value larger than or equal to the minimum support. The lift score measures how well the rule predicts the target item. Once the algorithm finds the interesting rules, you can easily use them to get product recommendations for your cross-sell Web sites or direct-mail materials.
Third-Party Algorithms (Plug-Ins)
The seven Microsoft algorithms pack a lot of power, but they might not give you the kind of knowledge or prediction patterns you need. If this is the case, you can develop a custom algorithm and host it on the Analysis Server. To fit into the data-mining framework, your algorithm needs to implement five main COM interfaces:
- The algorithm-factory interface is responsible for the creation and disposal of the algorithm instances.
- The metadata interface ensures access to the algorithm's parameters.
- The algorithm interface is responsible for learning the mining models and making predictions based on these models.
- The persistence interface supports the saving and loading of the mining models.
- The navigation interface ensures access to the contents of these models.
Some of these interfaces are elaborate and take getting used to, but implementation templates are available in the Tutorials and Samples part of the SQL Server 2005 documentation. After you implement and register your algorithm as a COM object, hooking it up to the Analysis Server is as easy as adding a few lines to the server configuration.
When the algorithm is ready and hooked up, its functionality immediately becomes available through the tools in the Business Intelligence Development Studio and SQL Server Management Studio. Analysis Server treats the new algorithm as its own and takes care of all object access and query support.
Dig In
Analysis Services 2005 represents a complete redesign of Microsoft's BI platform. Embracing .NET, XML for Analysis, and ADOMD.NET, it offers an array of powerful new algorithms, full-featured designers, and viewers. Even bigger news is how open and transparent the platform has become. With Analysis Services 2005's new client APIs, plug-in algorithm capabilities, server object model, managed user-defined functions (UDFs), and complete Microsoft Visual Studio integration, there's virtually no limit to what a motivated BI developer can do.