• subscribe
September 29, 2008 12:00 AM

Query Temporal Data

Work around SQL Server’s incomplete support
SQL Server Pro
InstantDoc ID #99874
Downloads
99874.zip

Executive Summary: Microsoft’s SQL Server has limited support for temporal data. Learn how to work around this limitation.

Temporal data can be difficult to handle in SQL Server queries, because SQL Server implements only a subset of the features that are defined by ANSI SQL to support temporal data. Even in SQL Server 2008, temporal data types represent only a moment in time—no support exists for an INTERVAL data type that represent a time quantity. In addition, no native support is available for operations on temporal data, such as checking whether two time periods overlap, or merging intersecting time periods. You must create your own custom solutions for such operations. For example, you can represent a period of time either as two points in time (start time and end time) or as a start point in time and an integer representing the duration in a predetermined unit (e.g., a second).

The Challenge
A friend and colleague, Davide Mauri, recently presented me with a temporal querying problem. The challenge involves changes that take place in attribute values of entities (e.g., customers). Suppose that you keep track of the validity periods of customers’ attribute values, such as phone numbers and addresses. You have a table called Phones where you keep track of the validity periods for phone numbers, a table called Addresses with validity periods of addresses, and so on. The challenge is to produce all periods with distinct combination of attribute values (phone number, address, etc.).

Run the code in Listing 1 to create the Phones and Addresses tables and populate them with sample data. Notice that I didn’t include an attribute in the tables to hold a customer ID; I wanted to simplify the problem and focus on the temporal querying logic.

For each phone number and address, the pair of attributes startdt and enddt represent the value’s validity period. The attribute value’s validity period is greater than or equal to startdt and less than enddt (startdt = period < enddt). A NULL in the enddt value stands for no end date or still applicable. As I mentioned earlier, the challenge is to produce all periods of time with distinct phone and address values. Table 1, page 22, shows the desired output. Before you read Davide’s solution and my solution, try to solve the problem on your own.

Solution 1
Davide’s complete solution appears in Listing 2, page 22. As you can see, the solution is expressed as a single statement that defines multiple common table expressions (CTEs). For simplicity, I’ll explain Davide’s solution one CTE at a time. Note that the code uses language elements that were introduced in SQL Server 2005 (CTEs and the PIVOT operator); therefore you can’t run the code if you’re using a previous version of SQL Server. However, you can easily adapt the solution to earlier versions of SQL Server by converting the use of CTEs to derived tables, and use pivoting techniques that were supported in SQL Server 2000.

The code in Davide’s solution first defines a CTE called Timestamps that represents all timestamps where the validity period of an attribute value either starts or ends. The code defining the Timestamps CTE has four queries, each returning either a start time or an end time of an attribute value, and all result sets are unified with UNION ALL set operations. Table 2 shows the output produced by the query that defines the Timestamps CTE. Notice in Table 2 that NULL end times were substituted by the value 99991231, and that duplicate timestamp values may appear.

The second step in the solution is to define the CTE TSDR (short for timestamps with dense ranks). The query defining this CTE calculates dense rank values to position the timestamps chronologically, and it uses the DISTINCT clause to get rid of duplicate timestamps. The output of the query defining the TSDR CTE is shown in Table 3. Notice in the output that after the removal of duplicate timestamps, only 5 out of 8 entries remain.

The third step in the solution is to define the CTE Intervals. The query defining this CTE joins two instances of TSDR (aliased as Cur and Nxt) to produce all possible periods of time from the timestamps in TSDR. The JOIN condition used for this purpose is Nxt.pos = Cur.pos + 1, meaning that each point in time that is considered as the start of a period will be matched with the nearest future point as the end of the period. The output of the query defining the Intervals CTE is shown in Table 4.

The fourth step in the solution is to define the CTE called UnifiedValues. This CTE is independent of the previously defined CTEs; the code defining this CTE queries the Phones and Addresses tables directly, and it uses a UNION ALL set operation to unify the validity periods of phones and addresses. The code returns a string with the attribute type (‘phone’ | ‘address’) in the attr column, and the attribute value in the val column. The code uses the COALESCE function to replace a NULL in the enddt attribute to the value 99991231. The output of the query defining the UnifiedValues CTE is shown in Table 5.

The fifth step in the solution is to define the CTE ValidValues. The query defining this CTE joins the Intervals CTE with the UnifiedValues CTE to match possible periods (from Intervals) and validity periods (from UnifiedValues) that overlap. The join predicate ensures that the two periods overlap. Two periods (e.g., p1 and p2) overlap if p1.end > p2.start and p1.start < p2.end. The query defining the ValidValues CTE returns for each pair of overlapping periods the start time and end time of the period from intervals, the attribute type (‘phone’ | ‘address’), and the attribute value. The output of the query defining the ValidValues CTE is shown in Table 6.

The code defining the CTE ValidValues returns the correct result, only with a separate row for each distinct period and attribute. The final step in the solution is achieved by the outer query. This query pivots the attributes such that you get a row for each distinct period, a column for each attribute, and the attribute value in the intersection of period and attribute.

Solution 2
Davide and I took similar approaches in the first part of our solutions, but we took different approaches in the second part. My complete solution appears in Listing 3.

Continue on Page 2



ARTICLE TOOLS

Comments
  • Vince
    4 years ago
    Oct 17, 2008

    Terrific! I was able to put this to use immediately. However, I needed to track temporal data by object (location and status in my case) so I added a piece identifier to solution 3 like so:

    with Timestamps as
    (
    select PieceId, StartDate as DateVal from PieceLocation
    union all select PieceId, coalesce(EndDate, '12/29/2029') from PieceLocation
    union all select PieceId, StartDate from PieceStatus
    union select PieceId, coalesce(EndDate, '12/29/2029') from PieceStatus
    ),
    TimestampsRN as
    (
    select PieceId, DateVal, row_number() over (order by PieceId, DateVal) as Position
    from Timestamps
    ),
    Intervals as
    (
    select cur.PieceId, StartDate = cur.DateVal, EndDate = nxt.DateVal
    from TimestampsRN cur
    inner join TimestampsRN nxt on cur.PieceId = nxt.PieceId
    and nxt.Position = cur.Position + 1
    )
    select i.StartDate, i.EndDate, i.PieceId, s.StatusId, l.LocationId
    from Intervals i
    left join PieceStatus s on i.PieceId = s.PieceId
    and coalesce(s.EndDate, '12/31/2029') > i.StartDate
    and s.StartDate < i.EndDate
    left join PieceLocation l on i.PieceId = l.PieceId
    and coalesce(l.EndDate, '12/31/2029') > i.StartDate
    and l.StartDate < i.EndDate
    go

    -- I hope the formatting carries over into the comment. Here goes...

  • Christian
    4 years ago
    Oct 08, 2008

    -- Slightly different creation of the date ranges:
    WITH StartDates AS ( -- Collect all "period delimiters"
    SELECT StartDt FROM dbo.Phones
    UNION SELECT StartDt FROM dbo.Addresses
    ),Ranges AS ( -- build the ranges from the collected delimiters
    SELECT StartDt, (SELECT ISNULL(MIN(StartDt),99999999) EndDt FROM StartDates WHERE StartDT > StartDates1.StartDt) EndDt
    FROM StartDates AS StartDates1
    )
    -- Join the rows to the ranges
    SELECT P.Phone, A.Address, R.StartDt, R.EndDt
    FROM Ranges R
    LEFT JOIN Phones P ON NOT P.StartDt >= R.EndDt AND NOT ISNULL(P.EndDt,99999999) <= R.StartDt
    LEFT JOIN Addresses A ON NOT A.StartDt >= R.EndDt AND NOT ISNULL(A.EndDt,99999999) <= R.StartDt

    Cheers,
    Chris

You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...