• subscribe
October 21, 2003 12:00 AM

The Case of the Overlapping Subtype

A reader brings a complex question to the experts
SQL Server Pro
InstantDoc ID #40280

Margaret's concern about the complexity of the queries she'll write is valid. For example, every time she wants to find AARP members, she has to write queries such as

SELECT * FROM Person
 WHERE PersonID IN (SELECT PersonID FROM 
 PersonType WHERE PType = 'AARP')

or

SELECT *
FROM Person p JOIN PersonType t ON 
p.PersonID = t.PersonID WHERE t.PType = 'AARP'

The queries will become even more cumbersome when Margaret needs to retrieve a list of people who belong to two or more groups. For example, to retrieve the names of people who are both AARP members and volunteers, she'll have to write a query like the following:

SELECT * FROM Person
 WHERE PersonID IN (SELECT PersonID 
 FROM PersonType WHERE PersonType = 'AARP')
 AND PersonID IN (SELECT PersonID FROM 
 PersonType WHERE PersonType = 'Volunteer')

Such complicated queries can seriously impede performance.

In Figure 1's Person table, you can see an attribute named PersonSubType. This is the subtype discriminator, which you add to the supertype table for a disjoint condition (or non-overlapping condition). The purpose of the subtype discriminator is to help you avoid having to write subqueries or joins just to find information such as the names of all AARP members. But when subtypes overlap, this simple discriminator structure doesn't work. You have to modify the structure in one of two ways.

Option 1. In the Person supertype table, the subtype discriminator could be a column named GroupSubType, which is a varchar(5) data type and nullable. A single character—V, L, A, B, or D (for Volunteer, Literacy, AARP, Board member, and aDvisor, respectively)—represents each subtype group. Now, the GroupSubType column might be null, or it might contain the letter V (for a volunteer), VB (for a volunteer and board member), or VLABD (if someone is a groupie). A query that returns the names of people who are AARP members would end up looking like this:

SELECT * FROM Person WHERE GroupSubType LIKE '%A%'

Option 2. The second way to implement the overlapping-subtype discriminator is to create not one but five subtype discriminators, one for each group. You can define each subtype discriminator column as a bit data type (1 or 0) and nullable. Now, if a person is a volunteer only, you turn on just the SubTypeVolunteer column. For the groupie, all five subtype columns are turned on. A variation of this technique would be to make all five subtype columns a char(1) data type and nullable and use the single-letter codes instead of the bit data type. If you wanted to create indexes on these subtype columns, you'd have to create them as char, not bit, data types. With this technique, the query that returns the AARP members would be

/* If using the bit data type for the 
subtype discriminator columns: */

SELECT * FROM Person 
WHERE SubTypeAARP = 1

or

/* If using the char(1) data type for 
the subtype discriminator columns: */

SELECT * FROM Person 
WHERE SubTypeAARP = 'A'

The greatest benefit of either technique is that you don't have to join tables or write subqueries to get the data you want. But each technique has a drawback. Option 1's drawback is that each query contains a wildcard character, such as the percent sign (%), at the beginning of a search string. Wildcards cause SQL Server to perform a table scan instead of using any index that exists on the column GroupSubType. The table scan could slow Margaret's query performance to an unacceptable degree. Option 2's drawback is that if you need to add more subtype groups, you have to alter the Person table's structure to accommodate them. During the alteration, the table will be unavailable to users. The number of rows in your Person table, the amount of query activity against the Person table, and the amount of time you set aside for database maintenance will influence your solution choice.

Neither solution is elegant; each has its drawbacks. However, although these solutions are less than ideal, they solve Margaret's problem and let her query the Person table without having to reference the PersonType table in the same query. Margaret chose Option 2 because her organization doesn't use the database 24 x 7, and if she has to expand the table occasionally by adding a new subtype discriminator field, she can do so easily. Like the other savvy database designers and DBAs who regularly read this magazine, Margaret is discovering the finer points of how to handle supertypes and subtypes.



ARTICLE TOOLS

Comments
  • Emily
    6 years ago
    Apr 27, 2006

    The help given by this article and the articles it referenced are worth the price of a year's subscription.

You must log on before posting a comment.

Are you a new visitor? Register Here