In “Implementing Supertypes and Subtypes” (November 2006, InstantDoc ID 93241), I discussed using supertypes and subtypes for data modeling and how to implement the design objects. Now you’re probably waiting for me to give you a real reason for adopting the supertype-subtype design concept. I will.

Whether you realize it or not, your company is failing to capitalize on business opportunities and is losing revenue because it doesn’t have a single source of accurate data, or data truth. And you’re well aware of the time you and your colleagues spend trying to reconcile the discrepancies in reports that you’re producing. Master Data Management (MDM) is a method you can use to target incomplete, inaccurate, and fragmented data that’s stored in various data stores in your enterprise.

What Is Master Data?
MDM is a hot topic in today’s market. I first googled “Master Data” and found 3.2 million entries, then “Master Data Management” and found 975,000 entries. There are seemingly hundreds of software companies selling MDM software. Some companies advertise that by using their products you’ll be able to connect to your legacy systems and fully integrate your core data across the entire enterprise with a mere click of your mouse.

I have news for you: Reconciling disparate data sources isn’t easy. When data warehousing was first introduced, DBAs experienced a revelation: To create a true enterprise data warehouse (EDW), DBAs had to develop common definitions of core data that was used by many of their companies’ applications. By trying to develop these definitions, DBAs found more definitions for customer than they ever expected and struggled with common data definitions and sources.

Many EDW projects fell by the wayside because DBAs couldn’t come to a consensus on a unified view of corporate data. Now, many corporations are revisiting these concepts to try and reconcile their disparate data sources and are using new buzzwords: master data and MDM. DBAs now realize that every piece of data in their company doesn’t need to be categorized, cleansed, and made uniform across the enterprise. Instead, there can be, and probably should be, departmentally based variations in data to support the different functions performed within each department. What we do know is that there is core data—people, places, and things—that needs to be tracked across the enterprise. This is master data.

You can think of master data as meta-data and real data combined. Master data consists of consensus-driven data and data definitions applied consistently across an enterprise. It’s consensus-driven because master data requires more than the IT or data warehouse groups to devise the metadata layer and to determine which set of customer data or inventory data is the real or base data. To gain consensus across the enterprise, you have to involve all the data stakeholders—business decision makers and IT staff—whose job depends on the validity of the data.

Managing Master Data
Managing master data has given rise to the MDM movement. Because MDM is consensus-driven and enterprise-wide, it shouldn’t surprise you that the governing body in an organization that determines what constitutes master data and how to manage it needs to be staffed with cross-functional individuals from the technology and business sides of the enterprise. “Cross-functional” means that these individuals have experience in many facets of business and technology. No one person is expected to know it all, but the more experience each member has, the better suited that person is to sit on the data governance board.

Figure 1, shows a typical enterprise data scenario at a very high level. The enterprise has legacy applications, existing applications, and new applications, and each application feeds from and writes to its own set of data. This is the genesis of informational silos. All the data stores feed into the Historical-Analytical System, which becomes a dumping ground for data. The analysts use the Historical-Analytical System to do two things: determine which data is most valuable, and use the data in this system to reconcile the discrepancies between the informational silos.

Figure 2 is a logical representation of what this environment would look like if it were under MDM control. In the center is the Master Data Repository. There are many ways to implement MDM control. One scheme is to maintain the data in the Master Data Repository and serve up the data to requesting applications on an as-needed basis and store the nonmaster data (for example, transaction details) locally. Another implementation scheme is to synchronize the master data content between the Master Data Repository and the applications. In either scheme, applications write master data to the Historical-Analytical System as the data is modified, whereas applications write transactional data from the local data stores to the Historical-Analytical System according to specific Historical-Analytical System requirements (not shown on Figure 2 to retain clarity). Master data is fairly static; it behaves like the slowly changing dimensions defined in Ralph Kimball et al.,The Data Warehouse Lifecycle Toolkit (Wiley, 1998). Transactional systems pinpoint business activity—sales activities, production-line operations, real-time activities—and usually generate volumes of data.

Using Supertype-Subtype Entities to Map Your Master Data Scheme
In Figure 3, I’ve laid the supertype-subtype entities on top of the structure in Figure 2 and removed most of the activity arrows (I did that only for clarity). Conceptually, the supertype entity belongs with the Master Data Repository, whereas each of the sub-type entities is part of one application. To illustrate, let’s look at a banking operation and imagine how it would operate in the scenario Figure 3 illustrates.

In a conventional organization similar to the one Figure 1 depicts, each department manages its own data internally, creating the ubiquitous informational silos. Let’s say for the sake of argument that there are separate applications for checking accounts, investment accounting, home mortgage loans, and commercial loans. At the end of the day, each department sends the data it collected to the Historical-Analytical System and analysts attempt to use this data to compile a complete financial picture of each customer. This data consolidation process isn’t always accurate because each application assigns a different identifier for the same customer. Additionally, the applications might not fully synchronize with one another, so the Historical-Analytical System might contain redundant data. Bank managers know they need to understand a customer’s financial activity, but with this fragmented picture of the customer they’re really limited in what they can hope to achieve.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Nicely Done Michelle! The tech columnists I read again and again and learn from are the ones that continually extend concepts from previous articles and tie them together into themes based on new discussions. Keep it up!

tgantos

Article Rating 5 out of 5