DOWNLOAD THE CODE:
Download the Code 22956.zip

More ways to avoid cursors

Some problems in SQL Server seem to require a cursor-based solution, but you can often find alternatives. Set-based solutions usually perform better, are more readable, and are easier to maintain than cursor-based solutions. Furthermore, set-based solutions are more in accord with the relational model, which is organized in terms of sets and not in terms of files. In "Sequential to Set-Based," November 2001, I showed how you can provide several different T-SQL set-based solutions for one such problem. I recently ran across another situation that posed a similar problem. A student in one of my classes claimed that the programmers in his company couldn't find a set-based solution for this problem, only a cursor-based solution.

The Problem
In this scenario, a company stores its financial transactions in two tables, Trans1 and Trans2. Running the script in Listing 1 creates the Trans1 and Trans2 tables and populates them with sample data. Each table holds the transaction ID, account ID, and monetary amount of each transaction, and each table's primary key is defined on account_id, tran_id. The transaction IDs indicate the chronological order of the transactions within an account, starting with 1 and incrementing by 1 each time a new transaction takes place.

My student asked how to write a query that matches the chronological order of transactions with the same account ID and amount. So, a row in Trans1 that has an account ID of x and an amount of y and is chronologically the nth row with those values matches a row in Trans2 that also has account ID x and amount y and is chronologically the nth row with those values. For example, the following rows in Trans1 belong to account ID 1 and have an amount of 100.0000:

account_id tran_id amount
1 1 100.0000
1 3 100.0000
1 5 100.0000
1 7 100.0000

The following rows in Trans2 also belong to account ID 1 and have an amount of 100.0000:

account_id tran_id amount
1 3 100.0000
1 4 100.0000
1 7 100.0000

Therefore, you need a query that matches the first transaction with the amount 100.0000 and account ID 1 in Trans1 with the first transaction with the amount 100.0000 and account ID 1 in Trans2, the second with the second, and so on. Thus, the script should match transaction IDs 1, 3, and 5 in Trans1 with transaction IDs 3, 4, and 7 in Trans2, respectively. Transaction ID 7 in Trans1 has no match in Trans2 because it's the fourth transaction with those account ID and amount values, but Trans 2 contains only three transactions with those account ID and amount values.

The Solution
First, you need a way to determine, for each transaction, the chronological position within the same account and amount. You can find this position by using a subquery in the SELECT list, as the query in Listing 2 shows. The subquery counts the rows that have the same account ID and amount as the row in the outer query and that also have a transaction ID that's less than or equal to the transaction ID in the outer query. Remember that the transaction ID represents the chronological order of events. Table 1 shows the output of Listing 2's query against Trans1. You can write a similar query that retrieves the transactions and chronological positions from Trans2.

The next step is to match the results of both queries. To provide the final solution, you could write a complex query that uses derived tables, where the derived tables hold the queries against Trans1 and Trans2, but such a query would be very lengthy and hard to maintain. A more simplified approach is to create views containing the queries against Trans1 and Trans2. You can run the script that Listing 3 shows to create the views VTrans1 and VTrans2. After creating the views, you can find the matching transactions by running the query that Listing 4 shows. This query joins the views based on account IDs, amounts, and chronological positions. Table 2 shows the output of Listing 4's query.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Ben,

Your article is very interesting and I promised to myself to take time and digest the information you give in this article. I am struggling with a concurrency problem and am thinking about a solution that goes in the opposite direction of what your article recommends though, so I am wondering if I am going in the right direction. A developer in my shop wrote a stored procedure that is called by a lot of programs. When executed this stored procedure will select a set of rows with a status code set to 'Job pending' and a job category set to say 'Batch'. The stored procedure then updates this set with one update statement, changing the status code to 'Active' and the jobid field to a value. When there was only one job in the system, this worked fine. Now that we have lots of jobs running concurrently, calling this same stored procedure, jobs time out left and right. I am advising the developer to rethink his stored procedure, to update one row at a time, sort of like cursor based update, except that it should not be cursor based, we would not have resolved the concurrency problem, but based on a list that he would have extracted previously in a set based retrieval with nolock hint. Do you think this would be the way for me to go?

Brigitte Arnett