| Executive Summary: Microsoft’s SQL Server has limited support for temporal data. Learn how to work around this limitation. |
Temporal data can be difficult to handle in SQL Server queries, because SQL Server implements only a subset of the features that are defined by ANSI SQL to support temporal data. Even in SQL Server 2008, temporal data types represent only a moment in time—no support exists for an INTERVAL data type that represent a time quantity. In addition, no native support is available for operations on temporal data, such as checking whether two time periods overlap, or merging intersecting time periods. You must create your own custom solutions for such operations. For example, you can represent a period of time either as two points in time (start time and end time) or as a start point in time and an integer representing the duration in a predetermined unit (e.g., a second).
The Challenge
A friend and colleague, Davide Mauri, recently presented me with a temporal querying problem. The challenge involves changes that take place in attribute values of entities (e.g., customers). Suppose that you keep track of the validity periods of customers’ attribute values, such as phone numbers and addresses. You have a table called Phones where you keep track of the validity periods for phone numbers, a table called Addresses with validity periods of addresses, and so on. The challenge is to produce all periods with distinct combination of attribute values (phone number, address, etc.).
Run the code in Listing 1 to create the Phones and Addresses tables and populate them with sample data. Notice that I didn’t include an attribute in the tables to hold a customer ID; I wanted to simplify the problem and focus on the temporal querying logic.
For each phone number and address, the pair of attributes startdt and enddt represent the value’s validity period. The attribute value’s validity period is greater than or equal to startdt and less than enddt (startdt = period < enddt). A NULL in the enddt value stands for no end date or still applicable. As I mentioned earlier, the challenge is to produce all periods of time with distinct phone and address values. Table 1, page 22, shows the desired output. Before you read Davide’s solution and my solution, try to solve the problem on your own.
Solution 1
Davide’s complete solution appears in Listing 2, page 22. As you can see, the solution is expressed as a single statement that defines multiple common table expressions (CTEs). For simplicity, I’ll explain Davide’s solution one CTE at a time. Note that the code uses language elements that were introduced in SQL Server 2005 (CTEs and the PIVOT operator); therefore you can’t run the code if you’re using a previous version of SQL Server. However, you can easily adapt the solution to earlier versions of SQL Server by converting the use of CTEs to derived tables, and use pivoting techniques that were supported in SQL Server 2000.
The code in Davide’s solution first defines a CTE called Timestamps that represents all timestamps where the validity period of an attribute value either starts or ends. The code defining the Timestamps CTE has four queries, each returning either a start time or an end time of an attribute value, and all result sets are unified with UNION ALL set operations. Table 2 shows the output produced by the query that defines the Timestamps CTE. Notice in Table 2 that NULL end times were substituted by the value 99991231, and that duplicate timestamp values may appear.
The second step in the solution is to define the CTE TSDR (short for timestamps with dense ranks). The query defining this CTE calculates dense rank values to position the timestamps chronologically, and it uses the DISTINCT clause to get rid of duplicate timestamps. The output of the query defining the TSDR CTE is shown in Table 3. Notice in the output that after the removal of duplicate timestamps, only 5 out of 8 entries remain.
The third step in the solution is to define the CTE Intervals. The query defining this CTE joins two instances of TSDR (aliased as Cur and Nxt) to produce all possible periods of time from the timestamps in TSDR. The JOIN condition used for this purpose is Nxt.pos = Cur.pos + 1, meaning that each point in time that is considered as the start of a period will be matched with the nearest future point as the end of the period. The output of the query defining the Intervals CTE is shown in Table 4.
The fourth step in the solution is to define the CTE called UnifiedValues. This CTE is independent of the previously defined CTEs; the code defining this CTE queries the Phones and Addresses tables directly, and it uses a UNION ALL set operation to unify the validity periods of phones and addresses. The code returns a string with the attribute type (‘phone’ | ‘address’) in the attr column, and the attribute value in the val column. The code uses the COALESCE function to replace a NULL in the enddt attribute to the value 99991231. The output of the query defining the UnifiedValues CTE is shown in Table 5.
The fifth step in the solution is to define the CTE ValidValues. The query defining this CTE joins the Intervals CTE with the UnifiedValues CTE to match possible periods (from Intervals) and validity periods (from UnifiedValues) that overlap. The join predicate ensures that the two periods overlap. Two periods (e.g., p1 and p2) overlap if p1.end > p2.start and p1.start < p2.end. The query defining the ValidValues CTE returns for each pair of overlapping periods the start time and end time of the period from intervals, the attribute type (‘phone’ | ‘address’), and the attribute value. The output of the query defining the ValidValues CTE is shown in Table 6.
The code defining the CTE ValidValues returns the correct result, only with a separate row for each distinct period and attribute. The final step in the solution is achieved by the outer query. This query pivots the attributes such that you get a row for each distinct period, a column for each attribute, and the attribute value in the intersection of period and attribute.
Solution 2
Davide and I took similar approaches in the first part of our solutions, but we took different approaches in the second part. My complete solution appears in Listing 3.
Continue on Page 2