SideBar    Logical Puzzle
DOWNLOAD THE CODE:
Download the Code 44570.zip

"Crime is common. Logic is rare. Therefore it is upon the logic rather than upon the crime that you should dwell."
Sir Arthur Conan Doyle, 1859-1930
The Adventures of Sherlock Holmes, "The Copper Beeches"

T-SQL querying problems often require you to apply complex logical thinking. Even though some people may be born with stronger logical-deduction abilities than others, logic is an area in which you can significantly improve by constant practice. So, the more you practice tough T-SQL puzzles, the more skilled you become. Since T-SQL querying problems require highly logical thought processes, you can improve your T-SQL skills by practicing purely logical puzzles, too. Logic puzzles are fun, intriguing, and challenging; they appeal to people of all ages. Besides improving your own T-SQL querying abilities, you can challenge your children, grandchildren, nieces, or nephews with logic puzzles. You never know; you might be raising a T-SQL master for the next generation. This article's T-SQL puzzle requires you to apply complex logic to a problem involving groups of date ranges. Then, the "Logical Puzzle" sidebar, page 27, provides a pure logical puzzle for you to practice.

Grouping Ranges Puzzle
Imagine you're the DBA for a chain of small stores selling firewood and potatoes to visitors at various national parks around the United States. In bad weather, when few people visit parks, a storeowner might decide not to open the store. At the end of every day that a store is open, the storeowner enters the day's total revenue into the FireTatoes Sales application, which uses SQL Server 2005 as its database. The application records the sales data in a local Sales table and SQL Server replicates it for analysis purposes to a central SQL Server database at the chain's headquarters at Angel's Landing in Zion National Park, Utah. Run the code in Listing 1 to create the central database's Stores and Sales tables and populate them with sample data.

At the beginning of every month, your manager needs a report that shows the activity periods of each store during the previous month. You need to write the T-SQL query that generates the desired results, grouping or "collapsing" consecutive sale days. For example, Figure 1 shows the desired results for January 2005. To make sure you understand the requirements, compare Listing 1's base data for store 1 (Zion) to the result in Figure 1. There are two groups of consecutive sale days for store 1 in January 2005: 1, 2, 3 and 23, 24, 25, 26. So, the result groups are 20050101 through 20050103 and 20050123 through 20050126. Try to solve the puzzle yourself before looking at my solutions; I walk through a SQL Server 2000−compatible solution first, then demonstrate how to solve the problem in SQL Server 2005.

SQL Server 2000−Compatible Solution
I'll first provide a SQL Server 2000−compatible solution that you can apply if you need to devise a solution to a similar problem involving SQL Server 2000 databases. The toughest part of this problem is figuring out what to group by. You need to calculate some value (call it grp) that you can use in the GROUP BY clause. Here's where you need to apply logic. First, define the logical problem: Calculate a value for each sales row that will be the same for all of a store's consecutive sale days. This means that you need to calculate a different value for each group of rows. Second, define the solution—that is, logically express the grp value. Once you define the solution logically, translating it to T-SQL should be straightforward. So, here's my suggestion for defining the grp value: For the same store as the current row's store, return the earliest sales date that's greater than or equal to the current row's sales date, after which the store was closed.

I think about it this way: you want to return a particular date for all consecutive rows in one group, and it's actually the latest date within the group. You return another date for all consecutive rows of a different group. If the answer isn't clear yet, bear with me until you look at the result of the T-SQL query that calculates the grp value, then go over the logic again.

Next, the translation to T-SQL: You want to return all rows from the table (call it the outer table) and use a subquery to fetch the desired date from another instance of the table (call it the inner table). "Earliest date" translates to MIN(dt); "for the same store" translates to a correlation by store ID; "greater than or equal to" translates to inner table's dt >= outer table's dt. And "after which the store was closed" translates to a NOT EXISTS predicate that looks for a sales date for the same store that's greater than the current sales date by 1 day. In other words, the subquery returns the latest sales date for the group.

Listing 2 shows the T-SQL query that calculates the desired grp value (the grouping factor). Figure 2 shows the result of Listing 2's query. Notice that you indeed get the same grp value for all rows of the same group and a different grp value for all rows of a different group. If you're confused by the logic I've described so far, look at the result in Figure 2 and try to figure out what you see there, then read my explanation again.

The rest of the solution is simple, as Listing 3 shows. Query the derived table D, which Listing 2 created. Group the rows by the store ID and grp, and return the store ID and lowest and highest dt values, which represent the range boundaries. This query gives you Figure 1's results.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

No Work No good No Good No Work

Anonymous User

Article Rating 1 out of 5

How would the acccuracy of datetime data type affect the solution in a 'real world' environment? Here, the dates are conveniently 'zero' for the time portion...

michael.smith

Article Rating 5 out of 5

Converting from a full DateTime to just a Date is easy enough: SELECT DATEADD(dd, 0, DATEDIFF(dd, 0, GETDATE())) Replace the GETDATE() call with whatever DateTime value you want to convert. You end up with the same date but with 0:00 for the time.

Anonymous User

Article Rating 5 out of 5

Hey! I really like this logic puzzle idea. Not only did it make me think about how to solve the problem, but then getting to read your solution provides another way of looking at it. Cool! I got the answer in a slightly different way. Here's my query:

SELECT GroupedSales.StoreID, MIN(GroupedSales.dt) AS StartDate, GroupedSales.EndDate FROM ( SELECT Sales.StoreID, Sales.dt, MIN(EndDates.dt) AS EndDate FROM Sales INNER JOIN ( SELECT a.StoreID, a.dt FROM Sales a LEFT JOIN Sales b ON a.StoreID=b.StoreID AND b.dt = DATEADD(dd, 1, a.dt) WHERE b.dt IS NULL ) EndDates ON Sales.StoreID=EndDates.StoreID AND Sales.dt <= EndDates.dt WHERE Sales.dt BETWEEN '1/1/05' AND '1/31/05' GROUP BY Sales.StoreID, Sales.dt ) GroupedSales GROUP BY StoreID, EndDate ORDER BY StoreID, StartDate

Basic idea was first to get the last day of each series of consecutive days, which I did by LEFT JOINING sales to itself but matching the next day. Any row that didn't include a next day result is the last day of that particular series. The next step was an inner join to that table, matching the StartDate as before or equal to the EndDate, and using the MIN of the resulting EndDates. At this point the results of the query are the same as listing 2, and the answer is retrieved by the outermost SELECT statement.

Anonymous User

Article Rating 5 out of 5