Analysis Services can make your data work for you
Data mining is the process of using automated methods to uncover meaning—in the form of trends, patterns, and relationships—from accumulated electronic data.
One of the most common data-mining tasks is targeting the most likely new customer prospects. Another common task is market segmentation. A market segment is a group of customers who share certain characteristics such as length of time as a customer, average expenditures, typical purchases, and so on. Discovering the similarities that characterize and distinguish segments is a computationally intensive operation that's ideally suited to data mining.
When you decide you want to use data mining, you need look no further than SQL Server. The decision tree and clustering data-mining algorithms that Microsoft provides in SQL Server 2000 Analysis Services are well suited to tackling the tasks of acquiring new customers and segmenting your markets. (For more information about using Microsoft data-mining algorithms, see the series of articles by Jim Yang, ZhaoHui Tang, Sanjay Soni, et al, "Model Performance," December 2001, InstantDoc ID 22957, and "Put Data Mining to Work," November 2001, InstantDoc ID 22498.) Let's walk through an example of how you can use the analysis facilities in Analysis Services to perform these tasks, then examine the useful information you can glean from the analysis.
Problem Scenario
The example scenario involves the imaginary enterprise Conference Corp, which provides industry-leading exposure to new IT trends and technologies through conferences, workshops, and seminars. The exclusive, "invitation-only" nature of the events requires the development of high-quality, targeted promotional materials such as personalized offers and conference brochures. Targeting is especially important because the quality materials are expensive to produce and mail. The company strives for high response and attendance rates, continually analyzing the effectiveness of its promotional campaigns.
The company has a database that's organized around a customer entity, the Customers table. This table and two others—Promotions and Conferences—define the Customer object. Figure 1 shows the tables and their constituent fields.
Customers receive many promotions for many events, and after they receive a promotion, they either ignore it or register and attend the promoted event. Say your job is to look at promotional "hits and misses" and determine what characteristics predispose customers to attend the promoted event. Once you know these characteristics, you can better target subsequent event promotions. Focused targeting lowers promotional costs and lets you better serve your customers by giving them information that's appropriate to their interests. This personalization is central to building customer loyalty over time.
To determine the characteristics of customers and prospects who are most likely to respond to your promotional offer, you have to first assemble an analysis data set that contains responses and non-responses to one or more offers. Further, you have to assemble a data set that contains enough distinguishing information to help you evaluate the likelihood of a response based on certain characteristics.
Your observation of the success or failure of previous Conference Corp marketing campaigns suggests that the probability of response is a function of the customer's job type, size of firm, and the firm's annual sales. You also suspect that response rates vary according to customer-relationship characteristics such as how long the person has been a customer, attendance at previous events, and so on. Also, Conference Corp business managers have observed that response seems to be related to the offer type, discount, and coupon policies as well as the number of promotions previously sent to the targeted prospect.
Sorting through all these potential predictors to find the unique combination of attributes that best describes the customer who's most likely to respond is difficult without some sort of automated pattern-search algorithm. As I demonstrate later, data-mining decision trees are particularly well suited to carrying out this kind of automated pattern search.
Unit of Analysis
In any data-mining task, one of the first decisions you need to make is which unit of analysis to use. Just as you can measure and analyze distance in many ways—from inches to miles, for example—you can measure and analyze customer behavior in many ways. A review of the Customer object's structure shows at least four potential units of analysis:
- The customer's company — 1955 cases (records) in the data set
- The individual customer — 3984 cases in the data set
- The response — 9934 responses in the data set: 8075 for the e-commerce conference (55 percent), 1467 for the Java conference (10 percent), and 392 for the Windows CE conference (3 percent), for a total response rate of about 68 percent
- The promotion — 14,589 cases in the data set
Using the customer as the unit of analysis might be tempting because you're studying customer behavior. However, for this example, the unit of analysis has to be the promotion because you want to look at every promotion and examine whether it produced a response from any customer. This answer will give you a response rate from which you can derive a profile of high-responding customers. So, promotion is the unit of analysis, and response is the outcome that you want to examine and explain. In this case, to explain the outcome, you need to construct a predictive model that uses the characteristics reported on the analysis record to discriminate between responding and non-responding customers.