| Executive Summary: Sometimes, a data warehouse requires many-to-many (M:N) relationships. Market-basket analysis is a technique used to determine how items in the data warehouse are related. Retail businesses frequently use market-basket analysis to evaluate customer buying behavior. |
Just like any well-designed database, a data warehouse
contains relationships. From everything
that you’ve read in this series over the past year,
you might be tempted to think that all relationships
in a data warehouse are one-to-many (1:M), from the
dimension to the fact table. However, that’s not true.
Occasionally, a data warehouse requires a many-tomany
(M:N) relationship.
You can successfully argue that the star schema is
inherently a M:N relationship. So why are we having
this discussion? Competition is forcing businesses to
become more innovative, and this innovation can bring
with it more products and services in combinations that
might not have been considered before. This innovation
also maps to increasingly complicated business models,
and subsequently, to more complex data warehouse
designs. A business intelligence (BI) analyst who knows
how to evaluate customer buying behavior can extract
information from the data warehouse to determine
which combinations of products and services will
entice a customer to make a purchase.
One technique the BI analysts use is called marketbasket
analysis, which is the process of evaluating collections
of items in the data warehouse to determine
whether they’re somehow related. Market-basket
analysis is frequently (but not exclusively) used in the
retail sector. Information gleaned from market-basket
analysis helps retailers understand buyers’ needs.
Based on this analysis, retailers can better appeal to
buyers by doing things such as reorganizing a store
layout. For example, the idea of co-locating beer and
baby diapers—so that when Dad runs to the market
to pick up baby diapers, he also buys beer—was a
finding that came out of early market-basket analysis
efforts. Understanding the buyers’ mind-set can enable
retailers to develop cross-promotional marketing programs,
capture new buyers, trim inventories down to
the products that sell best, and design intelligent sales
events. As a data warehouse designer and DBA, you
need to know how to model your database to support
market-basket analysis so that your BI analysts can
extract the information your business needs to beat
the competition.
A good example for this discussion is a build-to-order
checking account—a bank account in which
customers select the features they want from a menu,
essentially customizing the account and deciding how
much they will pay per month in bank fees. Some features
are free and some are not. The free features might
include a check card with reward points (choose your
favorite rewards program), online banking, bill-pay
service, email alerts, no minimum balance, no direct
deposit required, and identity theft insurance. Premium
services might include no fees for using another bank’s
ATM, rebates on ATM fees if you do use another
bank’s services, interest-earning accounts, double
rewards points, money back or extra rewards points on
check-card purchases, anniversary cash bonuses, and
overdraft fee forgiveness. Customers choose some or
all of the free features and one or two of the premium
features; these features are bundled into a customized
checking account. Then customers can choose as many
additional premium features to add to the custom
account as they want for a fee of say, $2 per feature
per month.
The OLTP Database Schema
If you were designing the OLTP database schema for a
build-to-order checking account, it would look similar
to Figure 1 (page 38). This schema stores each account
configuration, with general account information stored
in the Account table and the features for each account
stored in the AccountFeatures fact table. What’s not
included in this figure are tables to capture the many
types of transactions associated with an account, such
as check-card purchases, purchases made with paper
checks, bill-pay records, and ATM withdrawals.
Let’s assume that your datasource looks similar to Figure 1, with Account and AccountDetail in a
1:M relationship. Now your company’s CFO and vice
president of marketing want to know which customers’
accounts are using which features most often and in
which combinations, if any. They want to measure the
efficacy of the free features versus the premium features
and determine which features sell the best, so they can
better direct the bank’s marketing efforts. They also
want to better understand their customers’ buying
behavior. In other words, the CFO and vice president
of marketing want to do market-basket analysis. Why
does a customer pay for a purchase one time with a
check, but pays for his or her next purchase with a
check card? What are the circumstances surrounding
these varying types of buying behavior? If the cost to
process a paper check is a set fee per check, but the
cost to process a check card transaction is a percentage
of the total transaction amount, it might behoove the
bank to figure out a way to encourage its customers—
especially the big spenders—to pay with checks.
The Star Schema
To create the star schema that will help determine
the answers to these questions,
you’ll need a Transaction
fact table, a dimension
for the Account, and an
AccountFeatures fact table.
In addition, you’ll need categorical
dimensions for the
features, account category
(e.g., free, premium), and
the rewards program (e.g.,
airline, hotel, rental car).
Complexity in the schema
arises from the fact that to
answer the CFO’s questions,
a transaction needs to link to
one or more features in the
AccountFeatures fact table,
and each of the Account-
Feature records needs to link
back to its parent account.
This complexity mandates
a M:N relationship between
the Account dimension and
the Transaction fact table, as
shown in Figure 2. In addition,
there’s a conventional
1:M relationship between
the Account dimension and
the Transaction fact table.
This schema is multifunctional
because you can
use it for different types
of analyses. The Account,
Location, Customer, and Time dimensions are all
conformed, meaning they can participate in multiple
star schemas. Taken as a unit, the Account, Customer,
Location, and Time dimensions and the Transaction
fact table comprise a standard star schema model,
which enables BI analysts to create cubes from these
tables and extract information without regard to
which account feature was used to generate the transaction.
The M:N relationship between the Account
dimension and the Transaction fact table, which is
manifested in AccountFeatures, is the structure that
will enable market-basket analysis without having
to create a separate star schema. By linking each
transaction with the account feature that was used
to generate the transaction, BI analysts can begin to
find answers to the questions asked about customer
buying behavior.
By not creating two separate star schemas (i.e.,
a Transaction star schema and an AccountFeatures
star schema), you’re saving disk storage. If you were
to combine the Account and AccountFeatures tables
into a single dimension, that dimension would swell by
a factor of 15—more if additional features are added to the build-to-order checking account program. In
previous articles, such as “Data Warehousing: Junk
Dimensions,” May 2008, InstantDoc ID 98356, I
have discussed the desirability of keeping dimensions
tight and concise; adding features to the Account
dimension would create a large, unwieldy dimension
that wouldn’t function well when being cubed. Denormalizing
the Transaction fact table by incorporating
account feature attributes wouldn’t be the best idea,
either. The fact table would contain the most records
of all the tables in the schema; increasing its size and
adding more keys would diminish performance and
cause locking and blocking conflicts when the conventional
BI analyst and the market-basket BI analyst
run their programs simultaneously.
Beyond Retail
Market-basket analysis is an important operation that
has uses that go beyond the retail sector. For example,
you can discover relationships in a medical database,
such as patients who had procedure X subsequently
developing condition Y within n days/weeks/months.
In a law enforcement or social profiling database, you
might discover that people who engage in behavior
A by age n have a d percent chance of engaging in
behavior B for the rest of their life.
Market-Basket Hero
Market-basket/affinity analysis is predictive: It
attempts to forecast future behavior based on past
actions. Business is getting more and more competitive;
in response to this pressure, they are becoming more innovative, offering product and service combinations
that are derived from complex analysis of
customers’ buying behaviors. You’ll need to be able to
support market-basket efforts with your data warehouse
design; understanding how to do so will make
you a hero.
End of Article