• subscribe
April 29, 2008 12:00 AM

Temporal Histograms

Show the distribution of events over time
SQL Server Pro
InstantDoc ID #98360
Downloads
98360.zip

Executive Summary:

Dealing with temporal data can be tricky. Create a temporal histogram to show the distribution of events over time by first relaxing some of the requirements to solve a specific case, then enhancing the solution to make it more generic.

Recently I received a request from a customer to come up with a solution that produces temporal histograms—histograms showing the distribution of events over time periods. The problem was an interesting challenge and seemed like a generic need, so I decided to cover it in my column. First, I’ll explain the problem in terms of inputs and desired output. Next, I’ll provide a solution that handles only a specific case of the problem. Then, I’ll show you how to enhance the solution to make it more generic.

The Challenge
Suppose that you have a table called Events in your database, containing information about events in time. These events can be appointments, sessions, or anything that has start and end points in time. The Events table has three columns: event_id is the primary key, event_start is the start point in time of the event, and event_end is the end point. Run the code in Web Listing 1 to create the Events table in the tempdb database and populate it with sample data. You need to write a table function that accepts the following inputs:

 @from_dt—start point of a datetime range
  @to_dt—end point of a datetime range
  @date_part—a datetime part from the enumeration:
  ‘minute’, ‘hour’, ‘day’, ‘week’, ‘month’,
  ‘quarter’, ‘year’
  @num_parts—the number of datetime parts to be
  covered in each step

The table function should produce a histogram showing the number of active events during each fixed interval of time within the requested datetime range. The intervals of time, or steps, are based on the input @date_part and @num_parts.

The problem is best explained through an example. Given the inputs @from_dt = ‘20080501 00:00’, @ to_dt = ‘20080511 00:00’, @date_part = ‘day’, @ num_parts = 1, you’re supposed to produce the output in Web Table 1. Each row in the output represents a different fixed interval of time (from_dt - to_dt) within the requested datetime range. Because the requested date part is ‘day’, and the number of parts is 1, each step in the histogram represents one day. The fourth column in the output (num_events) holds the count of events from the Events table that were active during the current interval.

Regarding the histogram step boundary points, one of the requirements from my customer was to produce round points in time (round in respect to the input @ datepart), except for the extreme boundary points that must be the ones provided by the user. For example, given the inputs @from_dt = ‘20080501 12:30’, @to_dt = ‘20080510 10:00’, @date_part = ‘day’, @num_parts = 1, the step boundary points should be those in Web Table 2. Notice that the first step’s low boundary point is 2008-05-01 12:30:00.000 and the last step’s high boundary point is 2008-05-10 10:00:00.000, whereas all the other step boundary points represent whole days (in terms of day units).

Producing the Histogram Steps
You can start by creating a table function (call it fn_HistSteps) that returns the histogram steps table based on the previously mentioned input parameters. The function will return a row for each step with the step boundary points. Once defined, you can join the function with the Events table to match steps and events, group the result of the join by step, and count the number of events in each step.

So that you don’t have to deal with too many aspects of the problem at once, you can first relax some of the requirements. For example, take the @date_part and @num_parts inputs out of the equation, and solve the task for a specific interval—say, one day. After you manage to solve the problem for a specific interval, you can add the logic required to handle the requested @ date_part and @num_parts inputs.

In my solution I used an auxiliary table of numbers that you create and populate by running the code in Web Listing 2. This code creates a table called Nums with a single column called n, and populates the table with integers in the range 1 through 1,000,000.

To create the first version of the fn_HistSteps function, run the code in Web Listing 3. The function is an inline table-valued function based on a single query with multiple common table expressions (CTEs). The first CTE defined by the function’s code is called C0 and it has two columns: floor_from_dt and diff. The former is a floor of the input @from_dt value in terms of day units; that is, midnight of the input @from_dt value. The latter is the number of days in the range @from_dt - @to_dt.

The second CTE is called C1; it’s in charge of producing steps with round boundary points. This task is achieved by joining Nums and Steps, and returning all n values that are smaller than or equal to diff (number of days in the input range). The starting point of each step (from_dt) is calculated by adding n-1 days to floor_from_dt, and the ending point of each step (to_ dt) is calculated by adding n days to floor_from_dt.

The third CTE is called C2; it’s in charge of adjusting the extreme boundary points (start point of first step and end point of last step) if they need adjustment. Remember that the previous CTE (C1) produced round boundary points, although the requirement was that the extreme boundary points would be those provided by the user as the input datetime range boundary points. Note that as a result of adjusting the extreme boundary points, C2 might end up with rows representing irrational steps where to_dt isn’t greater than from_dt. Those rows will be eliminated by the outer query. The outer query simply returns all rows from C2 representing the histogram steps, excluding the irrational steps produced by a previous CTE.

Continued on page 2



ARTICLE TOOLS

Comments
  • Marcos
    3 years ago
    Mar 25, 2009

    Good article, as always.
    Keep writting Itzik!

  • meganbearly
    4 years ago
    May 12, 2008

    Hi rwillemain,
    Thanks so much for your feedback. In response to your question, Itzik Ben-Gan said, “The code implementing the fn_HistSteps function defines CTEs (using the WITH clause) which are new in Microsoft SQL Server 2005. In order for the code to work in Microsoft SQL Server 2000, you can use derived tables instead of CTEs. You can find the SQL Server 2000 compatible code for Listing 3 and Listing 4 by going to www.sqlmag.com, entering InstantDoc ID 99100, and clicking the Download the Code button.

    Regards,
    BG”

    Please let me know if you have further questions.

    Megan Bearly
    Associate Editor, SQL Server Magazine
    mbearly@sqlmag.com

  • STEVEN
    4 years ago
    May 12, 2008

    rwillemain - It doesn't work on SQL2K because SQL2K does not support Common Table Expressions.

    I like the solution, because I have an immediate need for something similar. My situation is different in that there is only one "datetime" column (to_dt, from_dt), and I just haven't had the chance to look at implementing this to work with one column.

  • RICHARD
    4 years ago
    May 04, 2008

    Well, I can't figure out why the funtion creations will not work as coded in the web listings> I am using SQL_2000, Here is the error return:

    "Server: Msg 156, Level 15, State 1, Procedure fn_HistSteps, Line 9
    Incorrect syntax near the keyword 'WITH'.
    Server: Msg 170, Level 15, State 1, Procedure fn_HistSteps, Line 30
    Line 30: Incorrect syntax near ','.
    Server: Msg 170, Level 15, State 1, Procedure fn_HistSteps, Line 54
    Line 54: Incorrect syntax near ','.
    "
    Any suggestions ? Thank you

  • RICHARD
    4 years ago
    May 04, 2008

    Excellent ! Perfect demonstrations ! This article is why I subcribe to this mag.
    Thank you.

You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...