Executive Summary:
This follow-up to “Getting to Know Wait Stats” (InstantDoc ID 96746) concentrates on a few SQL Server wait stats that typically account for most overall system waits.
|
Okay. You’ve read “Getting to Know Wait Stats”
(October 2007, InstantDoc ID 96746). You
know about wait stats—what they are, how to collect
them, and how to report on them. Now that you have
that foundational knowledge, I want to concentrate on
a few wait stats that typically account for most overall
system waits. While doing recent performance and scalability
audits for clients, I noticed a common pattern
regarding the most frequent waits I’m seeing. So, I felt
the time was ripe to talk about each.
Let’s start the discussion by taking a look at Table
1, which displays a list of 14 distinct wait types, in
descending order of total wait time. Of course, this table
is merely a partial list of the total wait types; your top
waits might or might not be the same as these. However,
it’s likely that many of these wait types are indeed part
of your top waits. For the purposes of this article, let’s
group these 14 wait types into five unique categories:
latches, locking, network, I/O, and parallelism.
Latches
The LATCH_xx and PAGELATCH_xx waits aren’t
related to physical I/O activities. LATCH_xx waits are
typical of contention for internal resources or structures
other than the buffer pool, particularly when you’re using
heaps or text datatypes. Addressing the problems associated
with the other wait types will generally also help
pure LATCH waits. PAGELATCH_xx waits are directly
related to contention in the memory caches, including
the buffer pool. High waits can indicate one of several
problems. One potential problem is inadequate amounts
or poorly optimized memory, in which there’s a lot of
contention for the existing pages in the caches. Adding
more memory or tuning your existing system usually
helps. Another cause might be extremely high volumes of
inserts in the same range of pages, particularly if it results
in page splits. This problem can be further complicated
if there are many selects on the same range. (You would
need to have many thousands of transactions per second
to see this problem occur in SQL Server 2005.) If this
is the problem, some index tuning might be necessary.
PAGELATCH_UP is an allocation type of wait and
typically shows up in the tempdb database under heavy
use. Adding more data files to tempdb can significantly
reduce or eliminate these waits.
Locking
The LCK_xx waits are associated with locking and
blocking. High volumes of these waits indicate that
something is preventing the efficient execution of the
storage engine’s locking mechanisms. This problem is
mostly caused by transactions that are too long or by
a lack of proper index usage, resulting in locking or
reading more rows than necessary, thus prolonging
the transaction and increasing the chances of further
blocking. Poorly configured hardware can also contribute
to high LCK_xx waits because each action
simply takes longer to finish, often leading to blocking
on multiuser systems.
Network
Most people assume that high network waits are the
result of problems directly related to the physical
network between the server and the clients. Although
this problem can affect the number of NETWORKIO
waits, the reality is that high network waits are most
likely caused by poor client response. If the client can’t
process the results as fast as SQL Server can send them,
you’ll start to see these waits. Poorly written client applications and overburdened hardware on the client
side can lead to high network waits.
I/O
The PAGEIOLATCH_xx, IO_COMPLETION, and
WRITELOG waits fall into the I/O category. PAGEIOLATCH_
xx waits are specific to disk-to-memory
transfers and almost certainly indicate problems with
your disk subsystem. At the very least, they suggest
that your subsystem is having trouble keeping up
with the volume of I/O requests. IO_COMPLETION
simply means that the system is waiting for I/O tasks to
finish. Finally, WRITELOG indicates that the system
is waiting for transaction log–related write requests to
finish. You can reduce most of these waits by using one
of several tactics:
- Tune your queries so that they read less data to satisfy
the requests
- Change the placement of the data and log files so
that they don’t compete with each other on any
particular drive array (e.g., separate the files onto
separate physical drives)
- Change the RAID array types and configurations
to improve performance (e.g., add more disks,
change the array from RAID 5 to RAID 10)
Parallelism
CXPacket waits are a direct result of parallel processing—
more specifically, inefficient parallel processing.
This wait appears when one thread in a parallel
plan is waiting on one or more other threads to finish
before it can proceed. High totals of this wait are
sure-fire indications that you might not be benefiting
much from the use of multiple processors or many
multiple processors in parallel. Consider lowering
the MAXDOP setting either at the server level or by
using query hints (if you can narrow down the culprits
to particular queries). See “Boost Performance with
Parallel Processing” (InstantDoc ID 97044) for more
details.
Further Study
The Microsoft article “Description of the waittype and
lastwaittype columns in the master.dbo.sysprocesses
table in SQL Server 2000 and SQL Server 2005” (support.microsoft.com/kb/822101) goes into much more detail
about the different waits. And, as always, be sure to
visit our Performance Tuning and Optimization forum
(sqlforums.windowsitpro.comweb/forum) to discuss this
topic further.
See associated Table