SideBar    The Logical Puzzle
DOWNLOAD THE CODE:
Download the Code 98360.zip

Executive Summary:

Dealing with temporal data can be tricky. Create a temporal histogram to show the distribution of events over time by first relaxing some of the requirements to solve a specific case, then enhancing the solution to make it more generic.

Recently I received a request from a customer to come up with a solution that produces temporal histograms—histograms showing the distribution of events over time periods. The problem was an interesting challenge and seemed like a generic need, so I decided to cover it in my column. First, I’ll explain the problem in terms of inputs and desired output. Next, I’ll provide a solution that handles only a specific case of the problem. Then, I’ll show you how to enhance the solution to make it more generic.

The Challenge
Suppose that you have a table called Events in your database, containing information about events in time. These events can be appointments, sessions, or anything that has start and end points in time. The Events table has three columns: event_id is the primary key, event_start is the start point in time of the event, and event_end is the end point. Run the code in Web Listing 1 to create the Events table in the tempdb database and populate it with sample data. You need to write a table function that accepts the following inputs:

 @from_dt—start point of a datetime range
  @to_dt—end point of a datetime range
  @date_part—a datetime part from the enumeration:
  ‘minute’, ‘hour’, ‘day’, ‘week’, ‘month’,
  ‘quarter’, ‘year’
  @num_parts—the number of datetime parts to be
  covered in each step

The table function should produce a histogram showing the number of active events during each fixed interval of time within the requested datetime range. The intervals of time, or steps, are based on the input @date_part and @num_parts.

The problem is best explained through an example. Given the inputs @from_dt = ‘20080501 00:00’, @ to_dt = ‘20080511 00:00’, @date_part = ‘day’, @ num_parts = 1, you’re supposed to produce the output in Web Table 1. Each row in the output represents a different fixed interval of time (from_dt - to_dt) within the requested datetime range. Because the requested date part is ‘day’, and the number of parts is 1, each step in the histogram represents one day. The fourth column in the output (num_events) holds the count of events from the Events table that were active during the current interval.

Regarding the histogram step boundary points, one of the requirements from my customer was to produce round points in time (round in respect to the input @ datepart), except for the extreme boundary points that must be the ones provided by the user. For example, given the inputs @from_dt = ‘20080501 12:30’, @to_dt = ‘20080510 10:00’, @date_part = ‘day’, @num_parts = 1, the step boundary points should be those in Web Table 2. Notice that the first step’s low boundary point is 2008-05-01 12:30:00.000 and the last step’s high boundary point is 2008-05-10 10:00:00.000, whereas all the other step boundary points represent whole days (in terms of day units).

Producing the Histogram Steps
You can start by creating a table function (call it fn_HistSteps) that returns the histogram steps table based on the previously mentioned input parameters. The function will return a row for each step with the step boundary points. Once defined, you can join the function with the Events table to match steps and events, group the result of the join by step, and count the number of events in each step.

So that you don’t have to deal with too many aspects of the problem at once, you can first relax some of the requirements. For example, take the @date_part and @num_parts inputs out of the equation, and solve the task for a specific interval—say, one day. After you manage to solve the problem for a specific interval, you can add the logic required to handle the requested @ date_part and @num_parts inputs.

In my solution I used an auxiliary table of numbers that you create and populate by running the code in Web Listing 2. This code creates a table called Nums with a single column called n, and populates the table with integers in the range 1 through 1,000,000.

To create the first version of the fn_HistSteps function, run the code in Web Listing 3. The function is an inline table-valued function based on a single query with multiple common table expressions (CTEs). The first CTE defined by the function’s code is called C0 and it has two columns: floor_from_dt and diff. The former is a floor of the input @from_dt value in terms of day units; that is, midnight of the input @from_dt value. The latter is the number of days in the range @from_dt - @to_dt.

The second CTE is called C1; it’s in charge of producing steps with round boundary points. This task is achieved by joining Nums and Steps, and returning all n values that are smaller than or equal to diff (number of days in the input range). The starting point of each step (from_dt) is calculated by adding n-1 days to floor_from_dt, and the ending point of each step (to_ dt) is calculated by adding n days to floor_from_dt.

The third CTE is called C2; it’s in charge of adjusting the extreme boundary points (start point of first step and end point of last step) if they need adjustment. Remember that the previous CTE (C1) produced round boundary points, although the requirement was that the extreme boundary points would be those provided by the user as the input datetime range boundary points. Note that as a result of adjusting the extreme boundary points, C2 might end up with rows representing irrational steps where to_dt isn’t greater than from_dt. Those rows will be eliminated by the outer query. The outer query simply returns all rows from C2 representing the histogram steps, excluding the irrational steps produced by a previous CTE.

Continued on page 2

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Quite useful, although I'd love to see a generic solution that would work on SQL8 as well. I realize that SQL10 is here, nevertheless SQL9, but I'd like to point out that many, many companies still maintain SQL8. It's similar to the XP/Vista problem. SQL8 works, is paid for, is stable, et cetera. Thus far, the biggest incentive to upgrading to SQL10 (for most of the DBAs I talk to) is the resource governor (which allows us to dual-purpose production servers as reporting servers, throttling the BI queries). It would be instructional to tie this in with SRS for a complete soup-to-nuts "project-oriented" article series. Otherwise a fun article!

dmeenan

Article Rating 4 out of 5

Excellent ! Perfect demonstrations ! This article is why I subcribe to this mag. Thank you.

rwillemain

Article Rating 5 out of 5

Well, I can't figure out why the funtion creations will not work as coded in the web listings> I am using SQL_2000, Here is the error return:

"Server: Msg 156, Level 15, State 1, Procedure fn_HistSteps, Line 9 Incorrect syntax near the keyword 'WITH'. Server: Msg 170, Level 15, State 1, Procedure fn_HistSteps, Line 30 Line 30: Incorrect syntax near ','. Server: Msg 170, Level 15, State 1, Procedure fn_HistSteps, Line 54 Line 54: Incorrect syntax near ','. " Any suggestions ? Thank you

rwillemain

Article Rating 3 out of 5

rwillemain - It doesn't work on SQL2K because SQL2K does not support Common Table Expressions.

I like the solution, because I have an immediate need for something similar. My situation is different in that there is only one "datetime" column (to_dt, from_dt), and I just haven't had the chance to look at implementing this to work with one column.

BurKaZoiD

Article Rating 5 out of 5

Hi rwillemain, Thanks so much for your feedback. In response to your question, Itzik Ben-Gan said, “The code implementing the fn_HistSteps function defines CTEs (using the WITH clause) which are new in Microsoft SQL Server 2005. In order for the code to work in Microsoft SQL Server 2000, you can use derived tables instead of CTEs. You can find the SQL Server 2000 compatible code for Listing 3 and Listing 4 by going to www.sqlmag.com, entering InstantDoc ID 99100, and clicking the Download the Code button.

Regards, BG”

Please let me know if you have further questions.

Megan Bearly Associate Editor, SQL Server Magazine mbearly@sqlmag.com

meganbearly

Article Rating 5 out of 5

 

  Related Articles

SQL Server 2000 Code for Temporal Histograms Generating Histograms

  Related Whitepapers

Buyer’s Guide to Log Management: Comparing On-Premise and On-Demand Solutions Get Started with Oracle on Windows DVD StoreVault SnapManagers for Microsoft Exchange and SQL Server

  Related Events

Power Up! With Virtualization Online Conference Microsoft TechEd Developers Microsoft Belgium Developer & IT Pro Days 2006

  Related eBooks

Making SQL Server Perform Backup and Recovery Survival Guide HA Solutions for Windows, SQL, and Exchange Servers

  Related Essential Guides

The Essential Guide to Business Intelligence Reporting: Choosing the Right Tool for the Right Job Virtualization of SQL Server 2008 The Essential Guide to Reporting Services Tips & Tricks

  Related Resources

Buy One Get One Order SQL Server Magazine and get Windows IT Pro Magazine FREE!! Instant Gratification - Only $5.95!! Instant online access to thousands of SQL Server Magazine articles! Get It All - Order Windows It Pro VIP Today! Online access to 26,000+ articles. A $500+ value for only $279!!   Email Newsletters

  vLabs Links

SQL Server 2000 SQL Server 2005 Upgrade