SideBar    Diagnosing Network Problems
DOWNLOAD THE CODE:
Download the Code 40925.zip

Track_waitstats Stored Procedure
The track_waitstats stored procedure in Listing 1 captures wait statistics from DBCC SQLPERF(waitstats) and ranks the waits in descending order according to how often they occur. The procedure displays each value as a percentage of the total waits. Solving the top problems provides the greatest improvement, and this ranking helps us identify the greatest opportunities for performance improvements. After track_waitstats gathers samples from DBCC SQLPERF, the stored procedure get_waitstats (which Listing 2 shows) provides a report that shows the wait type, wait time, and percentage of total waits for a workload. For example, the sample report in Figure 1 shows that network I/O waits cause the lion's share of wait time—48 percent. So improving network I/O is our best chance to improve this application's performance. You can also run get_waitstats during the execution of track_waitstats or even after track_waitstats completes.

Figure 1 shows that other performance-improvement opportunities for this application include reducing the number of exclusive locks (LCK_M_X) and the number of waits that slow transaction-log entries cause (WRITELOG). Exclusive-lock waits account for almost 13 percent of total wait time. Looking at transaction management might tell us whether we can make improvements in this area (e.g., we might be able to shorten lock durations). The WRITELOG wait type shows that threads are waiting for physical writes to complete to the transaction log. Because WRITELOG accounts for 11 percent of waits, we need to analyze Performance Monitor disk queues. Steady and high disk queues for the transaction log would indicate that the transaction-log drives might not have the I/O capacity to keep up with write requests.

Table 1 is an excerpt from the wait types document that our team uses to analyze waits. The document contains wait descriptions, probable causes, and other related considerations. (You can download the full document, "Wait Types, Perfmon Counters, and Correlations.") Table 1 shows wait-type descriptions and correlations to other performance information such as physical disk counters and buffer cache counters. The table isn't exhaustive, but it lets you see that if the information that track_waitstats returns shows a high percentage of a given wait type, you should look for correlated information to determine what might be causing those waits and to figure out how to solve the problem. "WaitTypes, Perfmon Counters, and Correlations" also contains the document we use to correlate Performance Monitor counter information with information from the track_waitstats stored procedure. The Performance Monitor counter document contains descriptions, additional correlations, and possible actions that our team can use to diagnose and correct performance problems.

Interesting Performance Monitor Ratios and Comparisons
Because Performance Monitor uses hundreds of counters, we must ask which counters are most relevant for each situation. Waitstats helps us identify performance-related counters. But waits information by itself isn't enough to accurately diagnose performance problems. The queues component of our methodology comes from Performance Monitor counters, which provide a view of system performance from a resource standpoint. Although the "Wait Types, Perfmon Counters, and Correlations" document is a useful resource, in complex cases, you have to include other counter information in your analysis. Table 2, which describes some of SQL Server's effects on CPU resources, shows how waits relate to CPU utilization and how utilization requirements affect database design choices and coding practices. Let's look at examples of problems that require extra information.

High ratio of compilations to batch requests. When you're working with online transaction processing (OLTP) applications, you usually want SQL Server to reuse query plans as much as possible to reduce the length of queues. When SQL Server reuses query plans, it doesn't need to compile the query before execution, thus reducing CPU utilization (which the System:Processor Queue Length counter shows) and shortening query-processing times. To get a ratio of compilations to batch requests, use the Performance Monitor counters SQLServer:SQL Statistics:Batch Requests/sec and SQLServer:SQL Statistics:SQL Compilations/sec.

When you find that SQL Server isn't reusing query plans efficiently, poor memory utilization (i.e., memory pressure) or poor coding practices might be to blame. To diagnose the cause, you can look at the SQL Server Profiler events SP:CacheMiss, SP:CacheInsert, SP:CacheRemove, SP:Recompilation, and SP:CacheHit. These events report plan activity in the cache. SP:CacheHit events show plan reuse. If a compile occurs, Profiler records the SP:CacheMiss and SP:CacheInsert events. SQL Server sometimes discards query plans to free memory for other activities. In such cases, the SP:CacheRemove event occurs. SP:Recompilation indicates that a recompile has occurred during the execution of a stored procedure. And if you notice a low value in the SQL Buffer Mgr:Page Life Expectancy Performance Monitor counter, your SQL Server system might be suffering from memory pressure.

To find out if poor coding might be the cause of a high compilation ratio, look at the master..syscacheobjects table to see all the query plans in the cache. High numbers can indicate plan reuse problems. You can use sp_executeSQL to parameterize ad hoc queries. Parameterization essentially means that you use variables instead of literal values as parameters in your queries. If ad hoc SQL statements are parameterized, SQL Server can reuse the query plans.

Prev. page     1 [2] 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Great artiI was reading the latest article Opening Microsoft's Performance-Tuning Toolbox and was trying to locate the document "Wait Types, Perfmon Counters and Correlations" which is specified in the article. I cannot locate this document. Can you please tell me how I can get it?

Thank you.

Pete Williams

Thank you Tom for a very interesting article. I am using your methodology whenever we have a perfromance problem. I hope in future articles you' ll show us more of the tools/scripts in the "SQL Server Customer Advisory Team" tool box.

Cheers

Rachid

rachid.ould-hamouda

Excellent article. It has helped me determine which items are not responsible for application performance issues we are experiencing.

Matthew Robinette

Excellent article. It has helped me determine which items are not responsible for application performance issues we are experiencing.

In virtually every issue of SqlMag there is at least one article that pertains to an issue I am currently experiencing.

Thanks for the help!

Matthew Robinette

The article gives relative ratios of various counters (A > B is good) but how does 2/1 compare to 10/1? More precision please.

Mark Pollina

Could you give me the syntax for run the track_waitlist stored procedure?

thanks, jodi

Jodi Pritchard

Very relevant and proffessional

tom parnell

Wonderful! Finally after all these years Microsoft engineer(s) see the light. Wait event based tuning is there for almost ten years in the Oracle world. Tom gives a very good explanation about the how's and why's, and the first proper 'wait event' documentation ever! I hope this will eventually lead to a per session wait interface, and even more detailed documentation.

Mario Broodbakker, The Netherlands

It is a great article and as Mario pointed out it is in right direction. I hope to see session level waitstats capture feature in SQL 2005, together with object level IO Statistics.

In this article however, I have a question about waitstat capture procedure. It captures by default data 10 times every 10 minutes, however the report produced only prints detail about the last sample. The waits can't be summed from the first sampling to the last, it needs dbcc sqlperf (waitstats,clear) within the sampling loop to be more useful.

skhanal

Article Rating 4 out of 5

 
 

ADS BY GOOGLE