SideBar    Logical Puzzle
DOWNLOAD THE CODE:
Download the Code 44570.zip

SQL Server 2005 Solution
T-SQL enhancements in SQL Server 2005 will let you come up with a much shorter, faster solution. Specifically, I'm talking about the ROW_NUMBER() function that Microsoft introduced in SQL Server 2005. For details about ROW_NUMBER(), see my April 2004 T-SQL 2005 Web column, "Calculating Row Numbers in SQL Server 2005," InstantDoc ID 42302.

Imagine row numbers assigned to each of a store's sale rows in date order. The most recent sale row gets 1, the next gets 2, and so on. The ability to calculate such row numbers lets you define the grp value as follows: current sale date minus the number of days that corresponds to the date's row number. Yes, that's all there is to it.

Now let's figure out why this simple calculation gives you what you're after. By definition, a "consecutive date" means a date that's greater by one than the previous date. The same applies to row numbers because they're consecutive by definition. So, if you subtract the row numbers from a group of consecutive dates, you're bound to get the same result date for each.

The real beauty of this solution is in what happens with the next group of consecutive dates. There's a gap of dates between one group and another, but there's no gap in the row numbers. This disparity ensures that the second group of dates will have a different result date value than the first. In fact, the difference between the grp value the calculation generated for one group and the grp value it generated for the next group will be the number of days in the gap minus one.

If you find this logic complicated, try following an example. Take the dates for store 1—20050101, 20050102, 20050103, 20050123, 20050124, 20050125, and 20050126—and their respective row numbers—1, 2, 3, 4, 5, 6, and 7. From each date, subtract the number of days that corresponds to each row number, as follows:

20050101 − 1 = 20041231
20050102 − 2 = 20041231
20050103 − 3 = 20041231
20050123 − 4 = 20050119
20050124 − 5 = 20050119
20050125 − 6 = 20050119
20050126 − 7 = 20050119

There's a gap of 20 days between the first group and the second, so the grp value generated for the second group is 19 greater than that for the first. In short, you got what you were after: a grp value that's the same for all rows of one group and different for all rows of another group. Here's the T-SQL query that calculates the grp values:

SELECT storeid, dt,
  dt - ROW_NUMBER()
    OVER(PARTITION BY storeid 
    ORDER BY dt) AS grp
FROM dbo.Sales
WHERE dt BETWEEN '20050101' AND 
  '20050131';

Figure 3 shows the results.

Again, the rest is simple. As Listing 4 shows, you use a derived table or a common table expression (CTE) and group the rows by the store ID and grp value. Since this solution applies to SQL Server 2005 only, I used a CTE to demonstrate the solution.

Work It Out for Yourself
Solving T-SQL querying problems sometimes requires you to apply complex logic. This puzzle showed how to find groups of existing (activity) ranges; next time, I'll examine how to group non-existing (inactivity) ranges. Meanwhile, try to write a solution that returns ranges of dates when each store was closed during January 2005, producing the result that Figure 4 shows.

I again emphasize that you can improve your T-SQL problem-solving skills by practicing pure logical puzzles such as the one in the "Logical Puzzle" sidebar. I'll provide the solution to this logic puzzle next month. If you have any new, interesting T-SQL or pure logic puzzles, I'd be happy to hear from you.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

No Work No good No Good No Work

Anonymous User

Article Rating 1 out of 5

How would the acccuracy of datetime data type affect the solution in a 'real world' environment? Here, the dates are conveniently 'zero' for the time portion...

michael.smith

Article Rating 5 out of 5

Converting from a full DateTime to just a Date is easy enough: SELECT DATEADD(dd, 0, DATEDIFF(dd, 0, GETDATE())) Replace the GETDATE() call with whatever DateTime value you want to convert. You end up with the same date but with 0:00 for the time.

Anonymous User

Article Rating 5 out of 5

Hey! I really like this logic puzzle idea. Not only did it make me think about how to solve the problem, but then getting to read your solution provides another way of looking at it. Cool! I got the answer in a slightly different way. Here's my query:

SELECT GroupedSales.StoreID, MIN(GroupedSales.dt) AS StartDate, GroupedSales.EndDate FROM ( SELECT Sales.StoreID, Sales.dt, MIN(EndDates.dt) AS EndDate FROM Sales INNER JOIN ( SELECT a.StoreID, a.dt FROM Sales a LEFT JOIN Sales b ON a.StoreID=b.StoreID AND b.dt = DATEADD(dd, 1, a.dt) WHERE b.dt IS NULL ) EndDates ON Sales.StoreID=EndDates.StoreID AND Sales.dt <= EndDates.dt WHERE Sales.dt BETWEEN '1/1/05' AND '1/31/05' GROUP BY Sales.StoreID, Sales.dt ) GroupedSales GROUP BY StoreID, EndDate ORDER BY StoreID, StartDate

Basic idea was first to get the last day of each series of consecutive days, which I did by LEFT JOINING sales to itself but matching the next day. Any row that didn't include a next day result is the last day of that particular series. The next step was an inner join to that table, matching the StartDate as before or equal to the EndDate, and using the MIN of the resulting EndDates. At this point the results of the query are the same as listing 2, and the answer is retrieved by the outermost SELECT statement.

Anonymous User

Article Rating 5 out of 5

 
 

ADS BY GOOGLE