Solution 2: Using Joins and GROUP BY
Although the first solution works, I usually try to solve each problem in more than one way to see if an alternative solution would work better. The difference in complexity and performance of the various solutions can sometimes be striking. The query in Listing 7 uses the UDF I created earlier to provide an alternative solution to the problem.
This query uses a 3-way join between three instances of the Samples table called S, E, and M. S represents starts of groups, E represents ends of groups, and M (which stands for middle) represents samples that fall between S and E. To be more accurate, S represents the samples right before the starts of groups. The JOIN condition simply verifies that the end's timestamp is greater than the start's timestamp and that the middle's timestamp falls between the start's timestamp and the end's timestamp. The filter in the WHERE clause verifies that the samples before the starts of the groups are less than 70 and that the ends of the groups are also less than 70.
The main trick in this query happens in the HAVING clause. The code groups the rows by the start and end timestamps, and the HAVING clause leaves only those groups in which the minimum NOx value is greater than or equal to 70. If you're not sure how the query produces the groups, think about the requirements againthe start of a group is a sample that has a value that's greater than or equal to 70 while the previous sample has a value that's less than 70. All following samples belong to the same group until and including the first sample whose value falls below 70.
Note that you can find the start of a group only by locating a previous value that's less than 70. So if the first sample that appears in the table (the one with the oldest timestamp) is greater than or equal to 70, it won't be counted as a start of a group because no samples come before it. If you want to treat such a sample as a beginning of a group, you can use a simple trick: Insert a sentinel sample with a timestamp that's older than all existing timestampsfor example, the base date January 1, 1900:
INSERT INTO Samples VALUES('1900-01-01 00:00:00.000', 0)
This second solution is more elegant than the firstit uses one small, simple query. However, it performs much worse. Obviously, you need more than 16 rows in the table to run realistic performance tests. You can use the code in Listing 8 to clear and repopulate the Samples table with 65,536 sample rows that contain random values between 0 and 100 in the NOx column. Now try the queries we produced in the first two solutions. On my laptop, the first solution's query took about 7 minutes; solution 2's query kept running until I decided to cancel it after more than 3 hours. Neither solution is optimal, to say the least. At this point, you might decide to give up and use cursorsor you can keep trying other avenues.
Solution 3: Using Subqueries and TOP
Listing 9's query uses subqueries and TOP to provide the desired results. The query defining the derived table S4 returns the start and end times of each group. The filter in the WHERE clause checks whether the sample's value is greater than or equal to 70 and that the previous sample's value is less than 70. The previous value is retrieved by a subquery that uses TOP 1 to get the value from the most recent sample whose timestamp is earlier than the current sample's timestamp. Similarly, the SELECT list inside the derived table S4 contains another subquery that uses TOP 1 to return the group's end time, which is the timestamp of the oldest sample whose value is less than 70 and whose timestamp is greater than the current sample's timestamp. Then, the outer query simply has to return the start and end times of each group, the elapsed time (which it calculates by using the DATEDIFF() function), and the maximum NOx value (which it finds by using a simple subquery).
Solution 3 performs considerably better than the previous two solutions. On my laptop, the query took about 1 second of CPU time and less than 2 seconds elapsed time. Using the dbo.fn_fmtelapsed() function to format the elapsed time added about 4 seconds. From my experience, the TOP option is a good way to improve query performance, and it's certainly true in this solution. But sometimes tuning is simply a matter of writing several different queries and choosing the one that performs best, even if you don't know why it works best. Now we have a practical solution, but it uses the T-SQLspecific feature TOP. If you're looking for an ANSI-compliant solution, keep reading.
Solution 4: Using Subqueries and GROUP BY
Dieter Nöth, an SQL wizard who frequently comes up with neat solutions to my puzzles, provided the ANSI-compliant query that Listing 10 shows. The query in the derived table T returns all rows that have a value greater than or equal to 70, meaning that each returned row is potentially a group start time. The SELECT list uses the timestamp as the potential start time and a subquery to return the current group's end time. The subquery retrieves the minimum timestamp of all the samples that were taken after the current sample and have a value less than 70. So, the derived table T contains group end times and potential group start times. The outer query groups the rows in T by end time and returns the minimum start time for each end time. The minimum start time is the real start time of a group of all rows with the same end time. The outer query also calculates the elapsed time and maximum NOx value. Amazingly, this solution performs slightly better than solution 3, taking about 1 second of CPU time and 2 seconds elapsed time.
Almost every T-SQL problem you face has many solutions. Always be aware of that fact and try to attack the problem from several different directions. And be sure to check your code's performance with sufficient amounts of sample data. Your next solution might work some real SQL magic.