DOWNLOAD THE CODE:
Download the Code 16175.zip

The Data Mining and Exploration group employed three broad strategies in developing SQL Server 2000 Analysis Services: self-service, integration of OLAP and data mining, and Universal Data Access (UDA). The group implemented the strategy of self-service primarily through task wizards that take you through the various steps involved in developing and using data-mining models. To integrate the OLAP and data-mining metaphors, the group decided to have the query-delivery mechanism deliver query results, whether OLAP-based or data mining-based, to the user through the same interface.

The OLE DB for Data Mining standard encapsulates the idea of a universal data-access mechanism that allows the sharing of data and data-mining results through heterogeneous environments with multiple applications. Microsoft developed OLE DB for tabular data sources but extended it for OLAP through OLE DB for OLAP in SQL Server 7.0 and for data mining through OLE DB for Data Mining in SQL Server 2000. With OLE DB for Data Mining, you can mine OLAP or relational data sources because the same OLE DB infrastructure supports both OLAP and data mining. Thus, the OLE DB for OLAP and OLE DB for Data Mining approaches reflect heterogeneous data access, a shared mining and multidimensional query storage medium, and a common interface for OLAP queries and data-mining queries.

The addition of data-mining capability in SQL Server 2000 represents a major new functional extension to SQL Server's capabilities. You can develop a data-mining model from either relational (standard tables) or dimensional (cube structures) sources. Microsoft extended the OLE DB data-access standard to provide generalized data-mining access mechanisms that include support for relational and dimensional sources as well as for third-party vendors' data-mining functionality. Analysis Manager contains several new wizards that facilitate interaction with data-mining models. (The OLE DB for Data Mining specification allows programmatic interaction.) And because SQL Server stores data-mining models as Decision Support Objects (DSO) in the same way that it stores dimensional cubes, you can use the same interface to service OLAP-style queries and data-mining queries.

OLE DB for Data Mining
OLE DB for Data Mining supports most popular data-mining algorithms. Using OLE DB for Data Mining, data-mining applications can tap into any tabular data source through an OLE DB provider, and you can perform data-mining analysis directly against a relational database. To bridge the gap between traditional data-mining techniques and modern relational database management systems (RDBMSs), OLE DB for Data Mining defines new concepts and features, including:

Data-mining model. The data-mining model is like a relational table, except that it contains special columns that you can use to derive the patterns and relationships that characterize the kinds of discoveries that data mining reveals, such as which offers drive sales or the characteristics of people who respond to a targeted marketing offer. You can also use these columns to make predictions; the data-mining model serves as the core functionality that both creates a prediction model and generates predictions. Unlike a standard relational table, which stores raw data, the data-mining model stores the patterns discovered by your data-mining algorithm. To create data-mining models, you use a CREATE statement that is very similar to the SQL CREATE TABLE statement. You populate a data-mining model by using the INSERT INTO statement, just as you would populate a table. The client application issues a SELECT statement to make predictions through the data-mining model. A prediction is like a query in that it shows the important fields in a given outcome, such as sales or probability of response. After the mining engine defines the important fields and stores them in the data-mining model, the model can use the same pattern to classify new data in which the outcome is unknown. The process of identifying the important fields that form a prediction's pattern is called training. The trained pattern, or structure, is what you save in the data-mining model.

OLE DB for Data Mining is an extension of OLE DB that lets data-mining client applications use data-mining services from a broad variety of providers. OLE DB for Data Mining treats data-mining models as a special type of table. When you insert the data into the table, a data-mining algorithm processes the data and the data-mining model query processor saves the resulting data-mining model instead of the data itself. You can then browse the saved data-mining model, refine it, or use it to make predictions.

OLE DB for Data Mining schema rowsets. These special-purpose schema rowsets let consumer applications find crucial information, such as available mining services, mining models, mining columns, and model contents. SQL Server 2000 Analysis Services' Analysis Manager and third-party data-mining providers populate schema rowsets during the model-creation stage, during which the data is examined for patterns. This process, called learning or training, refers to the examination of data to discern new patterns or, alternatively, the fact that the data-mining model is trained to recognize patterns in the new data source.

Prediction join operation. To facilitate deployment, this operation, which is similar to the join operation in SQL syntax, is mapped to a join query between a data-mining model (which contains the trained pattern from the original data) and the designated new input data. This mapping lets you easily generate a prediction result tailored to the business requirements of the analysis.

Predictive Model Markup Language (PMML). The OLE DB for Data Mining specification incorporates the PMML standards of the Data Mining Group (DMG), a data-mining consortium (http://www.oasis-open.org/cover/pmml.html). This specification gives developers an open interface to more effectively integrate data-mining tools and capabilities into line-of-business and e-commerce applications.

How the Data-Mining Process Looks
Data to be mined is a collection of tables. In an example I discuss later, you have a data object that contains a customer table that relates to a promotions table—both of which relate to a conference attendance table. This is a typical data-mining analysis scenario in which you use customer response to previous promotions to train a data-mining model to determine the characteristics of customers who are most likely to respond to new promotions. Through data mining, you first use the training process to identify historical patterns of behavior, then use these patterns to predict future behavior. Data mining accomplishes this prediction through a new data-mining operator, the prediction join, which you can implement through Data Transformation Services (DTS). DTS provides a simple query tool that lets you build a prediction package, which contains the trained data-mining model and points to an untrained data source that you want predicted outcome from. For example, if you had trained a data source to look for a pattern that predicts likely customer response to a conference invitation, you could use DTS to apply this predicted pattern to a new data source to see how many customers in the new data will likely respond. DTS's ready-made mechanism of deploying data-mining patterns provides a valuable synergy among data mining, BI, and data warehousing in the Microsoft environment.

The collection of data that makes up a single entity (such as a customer) is a case. The set of all associated cases (customers, promotions, conferences) is the case set. OLE DB for Data Mining uses nested tables—tables stored within other tables—as defined by the Data Shaping Service, which is part of Microsoft Data Access Components (MDAC). For example, you can store product purchases within the customer case. The OLE DB for Data Mining specification uses the SHAPE statement to perform this nesting.

A significant feature of SQL Server 2000's data-mining functionality is ease of deployment. With DTS, you can easily apply the results of previously trained models against new data sources. The strategy is to make data-mining products similar to classic data-processing products so that you can manipulate, examine, extract data from, and deploy data-mining models in the same way as you would any table in a typical database. This approach recognizes that data mining, as organizations usually practice it, requires the data-mining analyst to work outside the standard relational database. When you mine outside the database, you create a new database, which leads to redundancy, leaves room for error, takes time, and defeats the purpose of the database. So, a major objective of SQL Server 2000 is to embed the data-mining capability directly in the database so that a mining model is as much a database object as a data table is. If this approach is widely adopted in the industry, it will eliminate significant duplication of effort in the creation of data warehouses that are built especially for data-mining projects. This approach will also eliminate the time needed to produce specialized data-mining tables and the potential threats to data quality and data integrity that the creation of a separate data-mining database implies. Finally, in-place data mining (as direct data-mining access to the native database is called) will eliminate the time lag that the creation of a specialized data table inevitably entails. As the demand for data-mining products and enhancements increases, this time factor may prove to be the element that finally leads to the universal adoption of data-mining functionality as an intrinsic component of a core database management system (DBMS).

Prev. page     1 [2] 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I am just thinking about this topic. I found this article helpful.

cheng

 
 

ADS BY GOOGLE