In “Implementing Supertypes and Subtypes” (November 2006, InstantDoc ID 93241), I discussed using supertypes and subtypes for data modeling and how
to implement the design objects. Now you’re probably waiting for me to give you a real reason for adopting the supertype-subtype design concept. I will.
Whether you realize it or not, your company is failing to capitalize on business
opportunities and is losing revenue because it doesn’t have a single source of accurate
data, or data truth. And you’re well aware of the time you and your colleagues spend
trying to reconcile the discrepancies in reports that you’re producing. Master Data
Management (MDM) is a method you can use to target incomplete, inaccurate, and
fragmented data that’s stored in various data stores in your enterprise.
What Is Master
Data?
MDM is a hot topic
in today’s market. I first
googled “Master Data”
and found 3.2 million
entries, then “Master
Data Management”
and found 975,000
entries. There are seemingly hundreds of software companies selling
MDM software. Some
companies advertise that
by using their products
you’ll be able to connect
to your legacy systems
and fully integrate your
core data across the entire
enterprise with a mere
click of your mouse.
I have news for you: Reconciling disparate data sources isn’t easy. When data
warehousing was first introduced, DBAs experienced a revelation: To create a true
enterprise data warehouse (EDW), DBAs had to develop common definitions of core
data that was used by many of their companies’ applications. By trying to develop these definitions, DBAs found more definitions
for customer than they ever expected and
struggled with common data definitions
and sources.
Many EDW projects fell by the wayside because DBAs couldn’t come to a
consensus on a unified view of corporate
data. Now, many corporations are revisiting these concepts to try and reconcile
their disparate data sources and are using
new buzzwords: master data and MDM.
DBAs now realize that every piece of
data in their company doesn’t need to be
categorized, cleansed, and made uniform
across the enterprise. Instead, there can
be, and probably should be, departmentally based variations in data to support
the different functions performed within
each department. What we do know is
that there is core data—people, places, and
things—that needs to be tracked across
the enterprise. This is master data.
You can think of master data as meta-data and real data combined. Master data
consists of consensus-driven data and data
definitions applied consistently across an enterprise. It’s consensus-driven because master
data requires more than the IT or data
warehouse groups to devise the metadata
layer and to determine which set of customer data or inventory data is the real
or base data. To gain consensus across the
enterprise, you have to involve all the data
stakeholders—business decision makers
and IT staff—whose job depends on the
validity of the data.
Managing Master Data
Managing master data has given rise to
the MDM movement. Because MDM is
consensus-driven and enterprise-wide, it
shouldn’t surprise you that the governing
body in an organization that determines
what constitutes master data and how to manage it needs to be staffed with cross-functional individuals from the technology
and business sides of the enterprise. “Cross-functional” means that these individuals
have experience in many facets of business
and technology. No one person is expected
to know it all, but the more experience each member has, the better suited that person is
to sit on the data governance board.
Figure 1, shows a typical enterprise data scenario at a very high level. The
enterprise has legacy applications, existing
applications, and new applications, and each
application feeds from and writes to its own
set of data. This is the genesis of informational
silos. All the data stores feed into the Historical-Analytical System, which becomes
a dumping ground for data. The analysts
use the Historical-Analytical System to do
two things: determine which data is most
valuable, and use the data in this system
to reconcile the discrepancies between the
informational silos.
Figure 2 is a logical representation of
what this environment would look like if it
were under MDM control. In the center is
the Master Data Repository. There are many
ways to implement MDM control. One
scheme is to maintain the data in the Master
Data Repository and serve up the data to
requesting applications on an as-needed basis and store the nonmaster data (for
example, transaction details) locally. Another
implementation scheme is to synchronize
the master data content between the Master
Data Repository and the applications. In
either scheme, applications write master
data to the Historical-Analytical System as
the data is modified, whereas
applications write transactional data from the local
data stores to the Historical-Analytical System according
to specific Historical-Analytical System requirements
(not shown on Figure 2 to
retain clarity). Master data
is fairly static; it behaves like
the slowly changing dimensions defined in Ralph Kimball et al.,The Data Warehouse
Lifecycle Toolkit (Wiley, 1998).
Transactional systems pinpoint business activity—sales
activities, production-line
operations, real-time activities—and usually generate
volumes of data.
Using Supertype-Subtype Entities to Map Your
Master Data Scheme
In Figure 3, I’ve laid
the supertype-subtype
entities on top of the
structure in Figure 2 and
removed most of the
activity arrows (I did that
only for clarity). Conceptually, the supertype
entity belongs with the
Master Data Repository,
whereas each of the sub-type entities is part of one
application. To illustrate,
let’s look at a banking
operation and imagine
how it would operate
in the scenario Figure 3 illustrates.
In a conventional
organization similar to
the one Figure 1 depicts,
each department manages its own data internally, creating the
ubiquitous informational silos. Let’s say for
the sake of argument that there are separate
applications for checking accounts, investment accounting, home mortgage loans,
and commercial loans. At the end of the day,
each department sends the data it collected
to the Historical-Analytical System and
analysts attempt to use this data to compile a
complete financial picture of each customer.
This data consolidation process isn’t always
accurate because each application assigns a
different identifier for the same customer.
Additionally, the applications might not
fully synchronize with one another, so the
Historical-Analytical System might contain
redundant data. Bank managers know they
need to understand a customer’s financial
activity, but with this fragmented picture of
the customer they’re really limited in what
they can hope to achieve.
Prev. page  
[1]
2
next page