• subscribe
January 18, 2005 12:00 AM

Nonexistent Ranges

With logic, even querying nonexistent data is no problem
SQL Server Pro
InstantDoc ID #44817
Downloads
44817.zip

T-SQL querying problems often require you to apply complex logic in your solutions. In my January 2005 column, "Grouping Ranges" (InstantDoc ID 44570), I presented a scenario in which you had to use T-SQL to return ranges of consecutive existing sales dates for each store during January 2005; the solutions required you to apply logical deduction. To solve this month's problem, you'll also need to apply logical reasoning. This month's problem is the inverse of last month's: You need to identify nonexistent ranges of dates for each store during January 2005. That is, you need to produce a report that shows when stores were inactive during a particular month. Figure 1 shows the desired results. Two of the three solutions that I devised apply to all versions of SQL Server; the other applies only to SQL Server 2005. Before reviewing my solutions, try solving the puzzle yourself. I'll use the same tables I used last month in my examples. Run Listing 1's code to create the Sales and Stores tables and populate them with sample data. As a reminder, the Stores table contains a row for each store, with the store ID and name. The Sales table contains a row for each sales day that the store was open, including the store ID, sales date, and daily revenue.

Solution 1: Filtering Start Points and Matching End Points
The first solution involves two steps. First, you identify the starting points of gaps in the sales dates. Then, you match an endpoint of a gap to each starting point.

You can identify a sales date within the Sales table as a starting point of a gap by checking whether the following date exists in the table. Let's take store 1's data as an example. The store was open on January 1. To figure out whether this date marks a starting point of a gap, you need to check whether the store was also open on January 2. It was, so you continue to the next date. The store was open on January 3 but wasn't open on January 4, so there's the gap starting point. The starting point of the gap is January 4, 2005, which is the first missing date following the last activity date, January 3, 2005.

You can already see that you need to devise a query against the Stores table that uses a NOT EXISTS filter. However, because you're relying on existing sales dates to identify the starting points of the gaps, you need to be aware of an exception. If the store wasn't open on the first of the month, and the last day of the previous month doesn't exist in the table, such a query against Sales, using just the NOT EXISTS filter, won't identify the first of the month as the starting point of a gap. To take care of this problem, you can create a derived table, such as the one at callout A in Listing 2, that adds to January 2005's data an extra row that contains the date December 31, 2004, for each store. Run Listing 2's code to return the desired results, which Figure 2 shows—starting points of the gaps—accounting for gaps that start at the beginning of the month.

Notice that in the derived table S1, which the code at callout A creates, I didn't include existing sales rows for January 31. These rows are of no interest because we're not looking for gaps in February that start after January 31. Also, since I couldn't rely on having such a row in the Sales table, I had to query the Stores table to explicitly create a row for each store that contained the date December 31, 2004. The NOT EXISTS filter in the outer query (at callout B) returns only those sales rows for which the store was closed on the next date (no row exists where S1.dt = S2.dt - 1). The SELECT list of the outer query adds 1 day to the existing date to return the starting point of the gap.

The second step of this solution is to match a gap's endpoint to each starting point that you already found. Logically, you can match an endpoint to a starting point by identifying the next existing sales date in the table for the given store and starting point, minus 1 day. This logical deduction is based on the fact that the next existing date is the first date following the gap.

You can easily find the corresponding endpoint of a gap by using a subquery in the SELECT list, as I did at callout A in Listing 3. The subquery returns the MIN date minus 1 day from another instance of the table (call it S3), where the store ID is the same as the one in S1 (the outer table) and the date is greater than the one in S1. If the subquery returns NULL, the gap ends at the end of the month since there are no more dates for this month. For that case, I used a COALESCE() function that returns January 31, 2005. You can run the code in Listing 3, which contains the complete solution query, to get the desired results.

Solution 2: Calculating the Difference Between Adjacent Dates
The second solution to the problem is based on calculating the difference in day units between neighboring sales dates. This solution also involves two steps. First, you return pairs of neighboring sales dates, then you filter only pairs of dates that are more than 1 day apart.

If you think about it, neighboring sales dates that are more than 1 day apart mark the boundaries of a gap. You add 1 day to the earlier date, subtract 1 day from the later date, and you get the starting and ending dates of the gap itself. Of the three solutions that I present in this article, I like this one best because the logical deduction here is so simple and intuitive.

Run the code that Listing 4 shows to return the adjacent sales dates that Figure 3 shows. SQL Server issues the query against the derived table S1—the same derived table I used in the first solution. At callout A, the SELECT list invokes the subquery that returns the next existing sales date for each store. If no such sales date exists in the Sales table on or before February 1, 2005, a COALESCE() function substitutes "February 1, 2005" for the NULL.

The second step is simple: Use the query from step 1 to create the derived table D, as I did at callout A in Listing 5. Add a WHERE clause in the outer query against D that filters the pairs of dates between which the difference is more than 1 day. In the outer query's SELECT list, add 1 day to the current date, and subtract 1 day from the next date to return the gap itself. You can run Listing 5's code, which contains the complete solution query, to get the desired results.



ARTICLE TOOLS

Comments
  • David
    7 years ago
    Jan 21, 2005

    Generating sequential dates is something that should be in my toolbox. I made a function out of it as follows:
    CREATE Function dbo.udtBuildSeqDateTable(@StartDate as Datetime, @EndDate as DateTime)
    Returns @DateSeq Table (SeqDate DateTime)
    AS Begin
    Declare @Rcnt int
    --Insert the start date into temp table
    Insert Into @DateSeq(SeqDate) Values(@StartDate)
    Set @Rcnt=@@RowCount
    --error checking
    If @EndDate<@StartDate Return
    --double the number of rows with each insert
    WHILE @Rcnt < @EndDate-@StartDate BEGIN
    INSERT INTO @DateSeq(SeqDate) Select SeqDate+@Rcnt From @DateSeq order by SeqDate
    Select @Rcnt=@RCnt+@@RowCount
    END
    --Get rid of any excess rows
    Delete From @DateSeq Where SeqDate>@EndDate
    Return
    End
    --------------
    As an example of how to use the function, I needed to know which dates were missing from my Batch table:
    SELECT Seq.SeqDate AS MissingDates
    FROM PBME.dbo.udtBuildSeqDateTable('20030501', '20031001') AS Seq LEFT OUTER JOIN
    dbo.Batch ON Seq.SeqDate = dbo.Batch.BatchDate
    WHERE (dbo.Batch.BatchDate IS NULL)
    -----
    You could then use the algorithm from the previous article to show ranges.

    Dave C.

  • David
    7 years ago
    Jan 21, 2005

    thought provoking aricle

You must log on before posting a comment.

Are you a new visitor? Register Here