• subscribe
January 24, 2002 12:00 AM

Grouping Time Intervals

SQL Server Pro
InstantDoc ID #23511
Downloads
23511.zip

Try this set-based trick to solve a common problem

Some T-SQL problems seem to require cursor-based solutions but in fact have quite a few set-based solutions, as I've shown in recent columns. This month, let's look at a more complex example that programmers usually try to solve with cursors. You might have even had to handle this problem in one variation or another because it represents a common need: grouping time intervals. I'd like to thank Isaac Blank for sharing a beautiful solution, which I present in this article along with some other solutions.

To understand what grouping time intervals means, let's jump into a puzzle. After you've read the puzzle details and before you read the solutions, try to solve the puzzle by using the techniques I discussed in my previous two columns, "Matching Transactions," December 2001, InstantDoc ID 22956, and "Sequential to Set-Based," November 2001, InstantDoc ID 22431. Doing so will give you an appreciation of the difficulties of this type of problem.

Here's the puzzle. A table called Sessions stores starting and ending times of sessions that users open against a certain application—in this case, SQL Server. For each session, the Sessions table stores the username, session start time, and session end time. Note that those three attributes aren't necessarily unique; a user can open or close more than one session at the same time. To identify a session uniquely, I added to the table a surrogate key that gets its values from the identity property. To create and populate the Sessions table, run the script that Web Listing 1 (available online, see "More on the Web," page 16, for download instructions) shows.

Each row keeps track of one session. All individual sessions that together make up a consecutive time period with no breaks belong to the same session group. Your task is to write a query that returns the username, start time, and end time of each session group. You might notice a big warning sign at the beginning of the road that leads to the puzzle solution: "Danger! The road ahead is dizzying!" To keep the facts straight in your mind, consult Figure 1, which graphically shows each session as a green bar and each session group as a red arrow.

The First Solution
The first step in solving the problem is to retrieve the starting times of each session group. A certain row has the starting time of a session group if the session started after the maximum closing time among all the sessions that the same user opened earlier. This definition translates to a subquery in the WHERE clause. Listing 1, page 16, shows the T-SQL query, which uses the DISTINCT clause because several sessions can have the same start time and I don't want to return duplicates. Table 1, page 16, shows the output of Listing 1's query.

Note that three rows are missing from the output. The missing rows contain the starting times of each user's first session, which also happen to be the starting times of their session groups. Remember that the query in Listing 1 looks for rows that have a start time greater than the end time of a session opened earlier by the same user. A user's first session by definition has no earlier session; therefore, the subquery returns NULL. The logical expression NULL < starttime evaluates to unknown, and SQL Server doesn't return that row.

Now let's look at a simple trick you can use to make sure that the rows representing a user's first session are also returned. Listing 2 shows the query that returns the starting times of all the session groups. The trick is to use the ISNULL() function. If the subquery returns NULL, the query replaces the NULL with starttime -1, which is less than starttime. Cross-check the starting times in Listing 2's output, which Table 2 shows, with those in Figure 1.

The second and final step in solving this puzzle is to attach the appropriate end times to each row in the previous output, as the query in Web Listing 2 shows. Of all the user's session-group end times, the one you need to attach to a certain start time is the earliest one that ended at or after that start time. You can achieve this goal by using a subquery in the SELECT list.



ARTICLE TOOLS

Comments
  • Aaron Ellis
    10 years ago
    Feb 03, 2002

    SELECT a.username,
    a.starttime,
    MIN( c.endtime) AS 'endtime'
    FROM Sessions a
    LEFT JOIN Sessions b
    ON a.username = b.username
    AND a.starttime > b.starttime
    AND a.starttime <= b.endtime,
    ( SELECT a.username,
    a.endtime
    FROM Sessions a
    LEFT JOIN Sessions b
    ON a.username = b.username
    AND a.endtime >= b.starttime
    AND a.endtime < b.endtime
    WHERE b.endtime IS NULL
    ) c
    WHERE b.starttime IS NULL
    AND a.username = c.username
    AND a.starttime <= c.endtime
    GROUP BY a.username,
    a.starttime
    ORDER BY a.username,
    a.starttime

You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...