• subscribe
December 23, 2003 12:00 AM

Opening Microsoft's Performance-Tuning Toolbox

Use waits and queues to improve SQL Server performance
SQL Server Pro
InstantDoc ID #40925
Downloads
40925.zip

Ratio of compilations, recompilations, and batch requests. You can use the Performance Monitor counters SQLServer:SQL Statistics:SQL Compilations/sec and SQLServer: SQL Statistics:SQL Re-compilations/sec to diagnose compilation and recompilation problems. SQL Compilations/sec includes both initial compiles and recompiles for all stored procedures. Because SQL Re-compilations/sec includes only recompiles, subtracting SQL Re-compilations/sec from SQL Compilations/sec yields the number of initial compiles, as the following formula shows:

Initial Compiles(%)=(SQL
   Compilations/sec - SQL Re-
   compilations/sec) /Batch 
   Requests/sec

If the number of initial compiles is low compared to recompilations, you might have a recompilation problem. In contrast, a low recompilation number means that most compilations are initial compilations and that SQL Server isn't effectively reusing query plans.

If initial query-plan creation is high (when compared to the number of batch requests), SQL Server might not be reusing query plans effectively. You can use the following formula to aid your analysis:

Plan Reuse (%) = (Batch
   Requests/sec - Initial
   Compiles/sec) / Batch
   Requests/sec

If Batch Requests/sec are higher than Initial Compiles/sec, you have some plan reuse. If Batch Requests/sec equals Initial Compilations/sec, you have no plan reuse. You can see which query plans SQL Server reuses by looking in the usecounts column in master..syscacheobjects. A usecounts value greater than 1 indicates plan reuse for executable plans.

Disk Queue Length vs. Avg. Disk Seconds/transfer. The Performance Monitor counters PhysicalDisk:Avg Disk Queue Length and PhysicalDisk:Avg Disk Seconds/transfer provide measures of I/O subsystem performance. A queue results when SQL Server can't immediately satisfy requests for I/O. As disk queue length increases, so does the Avg Disk Seconds/transfer value. Although many people rely on queue-length information to determine whether an I/O subsystem bottleneck exists, the best source of bottleneck information is the Avg Disk Seconds/transfer counter because it factors in the effects of queues. From an end-user point of view, the average time it takes to complete a read or write is a true reflection of I/O performance. For example, if you bulk-load data into SQL Server and both the bulk-load file and SQL Server database file are on the same drive, you could get a sustained disk queue length of 1 and poor Disk Reads/sec and Disk Writes/sec performance. The queue length of 1 might not seem to be a problem; but the Disk Reads/sec value might be as high as 25ms, indicating an I/O subsystem bottleneck. Table 3 shows other effects SQL Server might have on I/O resources.

Low page-life expectancy with high checkpoint pages and lazywrites. You can use the Performance Monitor counters Page Life Expectancy, Checkpoint Pages/sec, and Lazywrites/sec to find out how long data pages stay in cache and how much cache activity (checkpoints and lazywrites) is required to keep memory available for current activities. Low page-life expectancy combined with high checkpoint pages and lazywrites causes memory pressure. To improve performance, implement one or more of the following changes:

  • Add more memory to the Windows server.
  • Increase SQL Server memory.
  • Avoid table and index scans by using proper indexing.
  • Eliminate bad query plans.

These interesting ratios can help you find ways to diagnose more complex performance problems. If a performance problem persists, you can keep extending your search by looking at other factors that might affect your system, such as network limitations. The sidebar "Diagnosing Network Problems" describes a method for determining how big a part network limitations play in performance.

Application and Database Design Problems
Application and database design affect performance in a variety of ways depending on the nature of your business and how you use your database. When you use the waits-and-queues methodology, you have to consider the design constraints of the application or database you're using. Table 4 describes some application-design implications.

For example, in an OLTP environment, plan reuse is typically desirable when you're performing highly predictable inventory lookups or order entry. In these cases, result sets are uniform and plan reuse is important for scalability. In other environments, such as those that extensively use ad hoc queries and reporting, plan reuse might not be desirable. Plan reuse isn't good when result-set sizes can vary greatly depending on parameter values or when intermediate work tables vary in size. For example, finding out how many widgets your company sold yesterday in California is different from finding the number of widgets sold last year, and you don't want to reuse the same query plan to find these two values.

Database design problems can range from improper indexing to overly normalized database designs in an OLTP environment. The kind of correlations you discover in your waits and queues analysis will depend on how you use your database.

The key to using the waits-and-queues methodology is to corroborate wait statistics with queues information to draw meaningful conclusions about performance problems. For example, a high percentage of a certain SQL Server wait type signals the need for further Performance Monitor investigation of the usage of underlying resources such as the processor, I/O subsystem, and network.

In some cases, an experienced performance expert must look beyond the symptom to find the root problem, and the "Wait Types, Perfmon Counters, and Correlations" document can help you pinpoint causes of performance problems. The document's correlated performance information, possible conclusions and actions, and interesting ratios and comparisons sections shed light on root problems for given symptoms. Correlating wait types information with performance counters and related counter ratios provides a broad view of application performance.



ARTICLE TOOLS

Comments
  • SHAILESH
    7 years ago
    Jan 31, 2005

    It is a great article and as Mario pointed out it is in right direction. I hope to see session level waitstats capture feature in SQL 2005, together with object level IO Statistics.

    In this article however, I have a question about waitstat capture procedure. It captures by default data 10 times every 10 minutes, however the report produced only prints detail about the last sample. The waits can't be summed from the first sampling to the last, it needs dbcc sqlperf (waitstats,clear) within the sampling loop to be more useful.

  • Mario Broodbakker, The Netherlands
    8 years ago
    May 10, 2004

    Wonderful! Finally after all these years Microsoft engineer(s) see the light. Wait event based tuning is there for almost ten years in the Oracle world. Tom gives a very good explanation about the how's and why's, and the first proper 'wait event' documentation ever!
    I hope this will eventually lead to a per session wait interface, and even more detailed documentation.

  • tom parnell
    8 years ago
    Apr 28, 2004

    Very relevant and proffessional

  • Jodi Pritchard
    8 years ago
    Apr 26, 2004

    Could you give me the syntax for run the track_waitlist stored procedure?

    thanks, jodi

  • Mark Pollina
    8 years ago
    Apr 22, 2004

    The article gives relative ratios of various counters (A > B is good) but how does 2/1 compare to 10/1? More precision please.

You must log on before posting a comment.

Are you a new visitor? Register Here