In the game of deadlocks, timing is everything. In this example, you have a slight window of opportunity if all the following conditions exist:
- The select statement uses the nonclustered index to find matching rows, but the statement needs to access the table data to complete its result set for any given row (the index doesn't cover the query).
- The update statement doesn't use the nonclustered index to find matching rows, but it needs to update the field in the nonclustered index.
- At least one row exists that, before any update, matches both statements' selection criteria.
Consider the execution plans and subsequent locking activity for the statements above. You can observe locking activity in SQL Server from several approaches. The best way to see the locks taken at each part of the execution is to use Trace Flag 1200. Microsoft documented this trace flag in SQL Server 6.5. Although the company doesn't document this flag for SQL Server 7.0, Trace Flag 1200 produces a thorough trace of the lock requests for any statement. Microsoft doesn't officially support the trace flags, but they're quite useful. Be warned, however, that the locking activity data is extensive; you should capture it only in controlled environments for small, isolated examples. For the purposes of this discussion, I distill the locking activity to the essential aspects involved in the deadlock. I won't cover the robust system of lock escalation.
As SQL Server executes the update statement and finds matching rows, it takes out exclusive locks on each row it updates. In this example, SQL Server uses a table scan to find the matching rows, and this method is the best execution plan given the schema and selection criteria. However, the update in this example also changes the value of a field in the nonclustered index (from Data2 to NewData2). Thus, for the row in question, SQL Server also needs to update the index with the new value to complete the transaction. SQL Server needs another exclusive lock, this time on the corresponding entry in the nonclustered index.
However, the nonclustered index also drives the select statement. Given the selection criteria, the SQL Server optimizer's only logical choice is to use the nonclustered index to find matching rows. When SQL Server is using the default locking mode, read committed, Process B takes shared locks on each entry in the index as it traverses that index. SQL Server releases these locks as soon as it determines no match for the row. When SQL Server finds a match, however, it holds the shared lock. Because the index (the field Non_Indexed_Field is represented only with the actual table data) doesn't completely cover this particular query, SQL Server needs to read the table to satisfy the query. SQL Server requests a shared lock for the corresponding row in the table.
By now you've probably noticed the window of opportunity for the deadly embrace. Process A requests two exclusive locks in sequential orderfirst on the row, then on the nonclustered index. Process B requests two shared locks in sequential orderfirst on the index, then on the row, which is the reverse order of Process A. This scenario is beginning to look like a classic deadlocking example, after all. If the timing is such that each process obtains its first lock successfully (the exclusive lock on the row for Process A and the shared lock on the index entry for Process B), the deadlock will occur.
The key to understanding this scenario is realizing that the first locks that Process A and Process B obtain can peacefully coexist. No logical or hierarchical connection exists between the row in the table and the entries in the nonclustered index. Thus, a process can obtain a lock on an index entry while the row in the table is locked.
You might want to reproduce this problem in a local environment by executing each statement in its own Query Analyzer window. But note that because the timing window opens and closes within the execution of each statement, you have no way to pause the processing in midstatement to force the timing. Even the quickest Minesweeper players will have little chance of successfully hitting the execute buttons fast enough. I needed to code similar examples into tight, infinite loops to produce the deadlock in a controlled environment. But this example is based on a real-world application, in which the window of timing opens about once every 10,000 update transactions.
Avoidance or Acceptance?
Despite your best efforts to control transactional sequencing, SQL Server manipulates resources at a level beyond your control, and in such a way that deadlocks might occur. In addition, deadlocks are more likely to occur when you have multiple indexes on a table to support diverse queries. What now?
You can consider one of the following avoidance or acceptance techniques, each of which has merits and problems. I include tips about identifying the proper technique for your situation.
Use the uncommitted read (dirty read) transaction isolation level. Using dirty reads is an avoidance technique. Submitting queries with the dirty read directive (by using the read uncommitted hint within the select statement or the set transaction isolation level statement) is a surefire way to prevent the deadlock in the previous example. Using a dirty read tells SQL Server to not take any shared locks during the reading process, letting the query return uncommitted data. In general, a dirty read transaction runs quicker than other types of transactions because it doesn't need to take out any locks, and because it lets other concurrent processes run quicker because they never need to wait for those locks. The obvious drawback, of course, is that dirty data might be incomplete or invalid. If the application has complex transactions that require several steps to complete and maintain data integrity, this approach won't work for you. If you choose this technique, and you decide to accept the risk that the SQL Server system will occasionally bring back incomplete data from the queries, you need to consider how your application will react to that data. Will other statements or transactions fail? Will SQL Server incorrectly alert users? If the system lets users drill down on the data in the screen for further detail, what will they see when they drill down? In general, use the dirty read approach only when the consequences of bad data to the downstream processes are minimal.
Remove the nonclustered index. In the absence of the nonclustered index, two things happen. First, the update doesn't need to keep that index current, so the update can complete its transaction with only one lock on the row. Second, the execution plan for the select statement changes dramatically. In the absence of an appropriate index, the SQL Server system needs to scan the table. Can this process execute as quickly as an index scan? The transaction process speed depends on several factors, including the size of the table, the selectivity of the index, and the distribution of data. When you evaluate the change in execution plans for the select statement, don't forget that the overall concurrency profile of the application might also improve with this approach. Thus, a small impact on one statement's performance might be negligible compared with the improved overall application performance that the change offers.
Cover the query. The previous example was one field away from having a covered query. In other words, SQL Server could have built the result set entirely by reading the index rather than scanning the table data, if I had built Non_Indexed_Field into the nonclustered index. The query could have completed its work with only one lock per matching row, and the deadlock never would have happened. Using covered queries is a good avoidance technique. In this situation, you completely eradicate the deadlock. However, you can't cover every query with indexes, especially in systems in which users have ad hoc querying capability. Also, every indexed field has a cost in storage and in the performance of inserts, updates, and deletes. In the real-world scenario that spawned this article, I would have had to add 15 columns to the index to follow this approach. Using covered queries is best suited for specific examples in tightly controlled systems.
Retry the transaction. Retrying the transaction is an acceptance approach. In most organizations, developers and users are constantly enhancing applications. Furthermore, many concurrency problems aren't identified in a requirements specification, in a test plan, or in a development lab. The more successful an application is, the more people will use itoften in ways that the developers didn't anticipate. People will use features at different time intervals, which will cause the application's concurrency profile to change. Retrying the transaction employs error handling and retry logic throughout the application. When you see error 1205, you can retry the offending statement an arbitrary number of times or until the transaction succeeds. Retrying the transaction is a good way to give your users uninterrupted availability, but it has three drawbacks. First, the method obscures easily preventable deadlocks; finding these deadlocks is tougher for you. (I'm assuming that you're already using the Profiler trace flags 1204 and 1205.) Second, the retry method is an after-the-fact remedy. By the time you decide to retry the transaction, the deadlock has occurred and the applications have incurred a delay. SQL Server 7.0 checks the lock queues every 5 seconds to detect and handle deadlock situations. Whenever a process waits on the same lock for more than one iteration of this check, SQL Server investigates the process for a deadlock. Thus, if you receive a deadlock error, your process might have been hung up for 5 to 10 seconds. Relying solely on this approach might degrade overall system performance. The third possible downside of the retry method is that it might cause code maintenance headaches, depending on how many different methods you employ to submit SQL statements. If you have a central component in which you can implement the retry logic once and apply it to the entire application, the retry technique will work fine.
The deadlock workaround that's right for you is probably some combination of the methods I've discussed. Having the retry logic in place is a good failsafe mechanism so that deadlocks remain your problem and not your users' problem. You can't cover every query. You'll need to index some fields that users regularly update. And, you'll likely have ramifications for reading dirty data.
The right workaround for you depends on your system's priorities. Is accurate, quick completion of the update the most important goal? Is user query response time the most important goal? What is the stimulus for each event? And how does the system's response affect downstream events? Answers to these questions will help guide your decision.
End of Article
Prev. page
1
[2]
next page -->