SQL Server 2005 Solution
T-SQL enhancements in SQL Server 2005 will let you come up with a much shorter, faster solution. Specifically, I'm talking about the ROW_NUMBER() function that Microsoft introduced in SQL Server 2005. For details about ROW_NUMBER(), see my April 2004 T-SQL 2005 Web column, "Calculating Row Numbers in SQL Server 2005," InstantDoc ID 42302.
Imagine row numbers assigned to each of a store's sale rows in date order. The most recent sale row gets 1, the next gets 2, and so on. The ability to calculate such row numbers lets you define the grp value as follows: current sale date minus the number of days that corresponds to the date's row number. Yes, that's all there is to it.
Now let's figure out why this simple calculation gives you what you're after. By definition, a "consecutive date" means a date that's greater by one than the previous date. The same applies to row numbers because they're consecutive by definition. So, if you subtract the row numbers from a group of consecutive dates, you're bound to get the same result date for each.
The real beauty of this solution is in what happens with the next group of consecutive dates. There's a gap of dates between one group and another, but there's no gap in the row numbers. This disparity ensures that the second group of dates will have a different result date value than the first. In fact, the difference between the grp value the calculation generated for one group and the grp value it generated for the next group will be the number of days in the gap minus one.
If you find this logic complicated, try following an example. Take the dates for store 120050101, 20050102, 20050103, 20050123, 20050124, 20050125, and 20050126and their respective row numbers1, 2, 3, 4, 5, 6, and 7. From each date, subtract the number of days that corresponds to each row number, as follows:
20050101 − 1 = 20041231
20050102 − 2 = 20041231
20050103 − 3 = 20041231
20050123 − 4 = 20050119
20050124 − 5 = 20050119
20050125 − 6 = 20050119
20050126 − 7 = 20050119
There's a gap of 20 days between the first group and the second, so the grp value generated for the second group is 19 greater than that for the first. In short, you got what you were after: a grp value that's the same for all rows of one group and different for all rows of another group. Here's the T-SQL query that calculates the grp values:
SELECT storeid, dt,
dt - ROW_NUMBER()
OVER(PARTITION BY storeid
ORDER BY dt) AS grp
FROM dbo.Sales
WHERE dt BETWEEN '20050101' AND
'20050131';
Figure 3 shows the results.
Again, the rest is simple. As Listing 4 shows, you use a derived table or a common table expression (CTE) and group the rows by the store ID and grp value. Since this solution applies to SQL Server 2005 only, I used a CTE to demonstrate the solution.
Work It Out for Yourself
Solving T-SQL querying problems sometimes requires you to apply complex logic. This puzzle showed how to find groups of existing (activity) ranges; next time, I'll examine how to group non-existing (inactivity) ranges. Meanwhile, try to write a solution that returns ranges of dates when each store was closed during January 2005, producing the result that Figure 4 shows.
I again emphasize that you can improve your T-SQL problem-solving skills by practicing pure logical puzzles such as the one in the "Logical Puzzle" sidebar. I'll provide the solution to this logic puzzle next month. If you have any new, interesting T-SQL or pure logic puzzles, I'd be happy to hear from you.
End of Article
Prev. page
1
[2]
next page -->