When you’re faced with a T-SQL querying problem, finding a set-based solution (as opposed to a cursor-based solution) is recommended for a number of reasons. First, set-based solutions are in accord with the relational model. Also, they typically require less code and less maintenance than cursor-based solutions. In addition, set-based solutions tend to perform better.

For many years, I couldn’t find a good-performing set-based solution for a particular querying problem that involves maximum concurrent sessions. I had an adequate set-based solution in terms of its logic, but it didn’t perform well. The only good-performing solution I had was cursor-based. I recently used this problem as an exercise in a class I was teaching, and one of my students—Darryl Page from the UK—had an idea that ultimately led to a good-performing set-based solution. In this article, I cover my original set-based solution, the cursor-based solution, and the performance and scaling ramifications of both solutions. Next month I’ll present the improved set-based solution based on Darryl’s insight, as well as a set-based solution that relies on standard language elements that aren’t yet implemented in SQL Server (as of SQL Server 2008) but hopefully will be supported in a future version.

The Problem
The task at hand involves a table called Sessions that keeps track of user sessions against an application. Suppose that every month you need to produce a report for the previous month’s activity, including the maximum number of concurrent sessions for each application. That is, for each application, you must calculate the maximum number of sessions that were active at the same time during the period of interest. You might need such information, for example, for billing purposes for per-user license types. Use the code in Listing 1 to create the Sessions table in the tempdb database (for test purposes) and populate it with sample data.

For simplicity, assume that the Sessions table contains only sessions during the period of interest, so you don’t need to filter a specific date range. Note that you should explicitly define the term concurrent for your customers. For example, if a session ends at the exact point in time that another session starts, are the sessions considered concurrent? For the purposes of our task, let’s assume that two such sessions are not considered concurrent. In this case, Table 1 shows the desired output for your solution, given the sample data in Listing 1.

Table 1: Desired Output from Solution
app mx
app1 4
app2 3

The indexing strategy that would help speed up both a set-based and a cursor-based solution is quite straightforward. Create one index on the key-list (app, starttime, endtime) and another on (app, endtime, starttime). Note that you must populate the Sessions table with a larger volume of sample data if you want to run performance tests. Use the code in Listing 2 to create a helper table function called GetNums that returns a sequence of integers of a requested size.

Use the code in Listing 3 to populate the Sessions table with a requested number of rows by querying the GetNums helper function. The code in Listing 3 populates the Sessions table with 10,000 rows, but you can adjust the number according to your needs by assigning your desired value to the variable @numrows. The sessions are associated with 10 different applications using randomization (1 + ABS(CHECKSUM(NEWID())) % 10), so about one tenth of the rows are associated with each application. I didn’t make the number of partitions (applications) variable, because adding partitions has a linear effect on all solutions (linear complexity), as I will show. A more interesting topic for discussion is how changing the partition size affects the different solutions’ performance. Changing the value of @numrows changes the partition size, because the number of partitions remains constant (10).

   Prev. page   [1] 2 3 4 5 6 7 8     next page
 
 

ADS BY GOOGLE