"Crime is common. Logic is rare. Therefore it is upon the logic rather than upon the crime that you should dwell."
Sir Arthur Conan Doyle, 1859-1930
The Adventures of Sherlock Holmes, "The Copper Beeches"
T-SQL querying problems often require you to apply complex logical thinking. Even though some people may be born with stronger logical-deduction abilities than others, logic is an area in which you can significantly improve by constant practice. So, the more you practice tough T-SQL puzzles, the more skilled you become. Since T-SQL querying problems require highly logical thought processes, you can improve your T-SQL skills by practicing purely logical puzzles, too. Logic puzzles are fun, intriguing, and challenging; they appeal to people of all ages. Besides improving your own T-SQL querying abilities, you can challenge your children, grandchildren, nieces, or nephews with logic puzzles. You never know; you might be raising a T-SQL master for the next generation. This article's T-SQL puzzle requires you to apply complex logic to a problem involving groups of date ranges. Then, the "Logical Puzzle" sidebar, page 27, provides a pure logical puzzle for you to practice.
Grouping Ranges Puzzle
Imagine you're the DBA for a chain of small stores selling firewood and potatoes to visitors at various national parks around the United States. In bad weather, when few people visit parks, a storeowner might decide not to open the store. At the end of every day that a store is open, the storeowner enters the day's total revenue into the FireTatoes Sales application, which uses SQL Server 2005 as its database. The application records the sales data in a local Sales table and SQL Server replicates it for analysis purposes to a central SQL Server database at the chain's headquarters at Angel's Landing in Zion National Park, Utah. Run the code in Listing 1 to create the central database's Stores and Sales tables and populate them with sample data.
At the beginning of every month, your manager needs a report that shows the activity periods of each store during the previous month. You need to write the T-SQL query that generates the desired results, grouping or "collapsing" consecutive sale days. For example, Figure 1 shows the desired results for January 2005. To make sure you understand the requirements, compare Listing 1's base data for store 1 (Zion) to the result in Figure 1. There are two groups of consecutive sale days for store 1 in January 2005: 1, 2, 3 and 23, 24, 25, 26. So, the result groups are 20050101 through 20050103 and 20050123 through 20050126. Try to solve the puzzle yourself before looking at my solutions; I walk through a SQL Server 2000−compatible solution first, then demonstrate how to solve the problem in SQL Server 2005.
SQL Server 2000−Compatible Solution
I'll first provide a SQL Server 2000−compatible solution that you can apply if you need to devise a solution to a similar problem involving SQL Server 2000 databases. The toughest part of this problem is figuring out what to group by. You need to calculate some value (call it grp) that you can use in the GROUP BY clause. Here's where you need to apply logic. First, define the logical problem: Calculate a value for each sales row that will be the same for all of a store's consecutive sale days. This means that you need to calculate a different value for each group of rows. Second, define the solutionthat is, logically express the grp value. Once you define the solution logically, translating it to T-SQL should be straightforward. So, here's my suggestion for defining the grp value: For the same store as the current row's store, return the earliest sales date that's greater than or equal to the current row's sales date, after which the store was closed.
I think about it this way: you want to return a particular date for all consecutive rows in one group, and it's actually the latest date within the group. You return another date for all consecutive rows of a different group. If the answer isn't clear yet, bear with me until you look at the result of the T-SQL query that calculates the grp value, then go over the logic again.
Next, the translation to T-SQL: You want to return all rows from the table (call it the outer table) and use a subquery to fetch the desired date from another instance of the table (call it the inner table). "Earliest date" translates to MIN(dt); "for the same store" translates to a correlation by store ID; "greater than or equal to" translates to inner table's dt >= outer table's dt. And "after which the store was closed" translates to a NOT EXISTS predicate that looks for a sales date for the same store that's greater than the current sales date by 1 day. In other words, the subquery returns the latest sales date for the group.
Listing 2 shows the T-SQL query that calculates the desired grp value (the grouping factor). Figure 2 shows the result of Listing 2's query. Notice that you indeed get the same grp value for all rows of the same group and a different grp value for all rows of a different group. If you're confused by the logic I've described so far, look at the result in Figure 2 and try to figure out what you see there, then read my explanation again.
The rest of the solution is simple, as Listing 3 shows. Query the derived table D, which Listing 2 created. Group the rows by the store ID and grp, and return the store ID and lowest and highest dt values, which represent the range boundaries. This query gives you Figure 1's results.
Prev. page  
[1]
2
next page