In "Data-Modeling Contention," September 2006, InstantDoc ID 92670, I discussed using supertypes and subtypes when modeling data. Since that article's publication,-I've received email from readers asking for more information about when and how to use supertypes and subtypes both for data modeling and implementing in a database. I'll try to answer these questions, but first let's discuss why you'd use supertypes and subtypes.
When to Use Supertypes and Subtypes
You can use supertypes and subtypes when you're creating the conceptual data model. Supertyping and subtyping are techniques that data modelers use to better understand how data is structured and to understand the metadata. Supertypes and subtypes are structures that are used in the requirements-gathering stage of database development.
One might argue that subtyping is simply establishing a set of categories for a generalized entity, so why don't we just use a simple lookup table instead of the supertype-subtype structure? Sometimes we do. For example, when a category is simple and lends itself to a one-or two-word description, such as Microsoft Outlook's Task Status—Not Started, In Progress, Completed, Waiting on someone else, or Deferred— using a lookup or reference table and creating a foreign key reference to the main entity is the right thing to do. In that case, Task Status to Task would be a one-to-many (1:M) relationship, as the conceptual data model in Figure 1 shows.
However, some categorizations require more than one word or phrase to describe their properties.The simple 1:M reference won't work when an entity can fall under two or more categories at the same time. Keywords that define a product are prime examples of this condition. For example, a jacket can be categorized as menswear, outerwear, and winter wear. The keywords are brief enough to justify not using a subtype structure; however, you'll have to create a many-to-many (M:N) relationship between Keyword and Product, as the conceptual data model in Figure 2 shows.
Once the categorization begins to take on a life of its own, with many attributes that describe how the category looks and behaves, it's time to invoke the supertype-subtype structure. I'm going to start this discussion assuming that you're acquainted with the basics of supertypes and subtypes. If you need a review, read "Supertypes and Subtypes," May 1999, InstantDoc ID 5226, and "The Case of the Overlapping Subtype," November 2003, InstantDoc ID 40280.
Data modelers use two kinds of supertype-subtype structures: mutually exclusive and overlapping. In a mutually exclusive structure, an entity falls into no more than one subtype category. In an overlapping structure, an entity can be classified as zero, one, two, or more subtypes. The exclusive supertype-subtype is analogous to a 1:M relationship; the overlapping supertype-subtype is more like an M:N relationship. Figure 3 shows a generic model of a supertype-subtype structure.
Implementing Exclusive Supertype-Subtype Structures
Within the exclusive supertype-subtype structure, you can implement the data model in three ways: supertype table only, subtype tables only,and supertype table plus subtype tables. Let's look at an example of each type of implementation and discuss why I've implemented them as I have. Supertype table only: North Pole Clothiers. The North Pole has one garment manufacturer, appropriately named North Pole Clothiers.The company makes men's clothing, women's clothing, clothing for elves, and the occasional unisex, unispecies garment.This scenario is an example of a supertype-subtype structure in which the characteristics of each subtype are the same (i.e., size, color, inseam). Equally important, the processes that go into creating a garment for men, women, or elves—or even a unisex, unispecies garment—are also all the same. If the subtype entities don't have attributes of their own, then it's appropriate to implement just the supertype table. In this case, for example, a supertype table called Garment could account for all the characteristics in all the garment categories.
Subtype tables only: North Country Outfitters. North Country Outfitters arranges excursions into the outback and offers canoe and kayak rentals by the day in addition to selling camping, hiking, and outdoor activity products.The Goods and Services subtypes are so different that you'll want to implement subtype tables only. Although Goods and Services have a few attributes in common (e.g., Name, CurrentAvailability, DateAddedToInventory), most of the characteristics of Goods (products for sale or rent) are very different from those of Services. So a supertype table called Inventory would be redundant or unnecessary.
Both subtypes are related to entities such as Sale and Sale Return, but they also have relationships independent of each other. For instance, Services—such as a guided trip— can be reserved for a future time (associated with the reservation system) and can be associated with contract employees. Goods and Services are even related to each other, because certain services (e.g., canoe trips) require that the company provide goods (e.g., canoes, paddles, life jackets). In this subtype-only implementation, each subtype table will inherit the attributes of the supertype table, with the exception of the supertype surrogate primary key, if one exists.
Supertype table and subtype tables: MyDreamHome Construction Company. MyDreamHome is a residential construction company that builds custom homes, develops entire tracts of low-to medium-priced housing, and remodels older homes. These three types of projects have attributes and relationships in common, so you'd create a supertype table, Building Project, containing an extensive attribute set that's relevant to all three subtypes—Custom Home, Tract Development, and Remodel. However, because the subtype activities are quite different, each subtype is exclusive of the others and will have attributes that don't pertain to the other subtypes. There won't be any relationships between subtypes, but some of the subtypes might have the same or similar relationships to outside entities— for example, both Custom Home and Remodel could be related to outside architectural firms for custom plans, whereas Tract Development will use blueprints created specifically for the development and won't be related to outside entities.
In a situation such as this—when you have a full set of attributes in common in the supertype and an equally full set of attributes as well as varying relationships for each subtype—it's appropriate to implement both a supertype table and a subtype table. For queries that need to return data from both the supertype table and the subtype tables in such an implementation, realize that you'll be joining the two tables.
The supertype table in this configuration should contain a flag field called a subtype discriminator—a number or short code that indicates which subtype a row in the supertype is associated with. Single letters work best; for example, you could use C to represent Custom Home,T for Tract Home, and R for Remodel. Think of the subtype discriminator as a pointer from the subtype table to the supertype. Its purpose is to let you avoid unnecessary two-table joins.If you need data from only the supertype table, but just for a certain subtype, such as Custom Home, you can create a query that returns only data for which the subtype discriminator equals C.