DOWNLOAD THE CODE:
Download the Code 96513.zip

Every DBA knows that there are certain aspects of SQL Server and its associated hardware that you simply must monitor if you want to measure overall system performance. However, not everyone knows exactly what to monitor, how to monitor, and—most important—how to interpret the results. In this longer-than-usual edition of "PTO for the Rest of Us," I'll go into some depth to explore one of these crucial puzzle pieces: I'll use a built-in dynamic management view (DMV) in SQL Server 2005 related to physical file access—sys.dm_io_virtual_file_stats()—and I'll generate and use a small amount of custom code that will let you easily report on this DMV's output. Although you'll find this DMV only in SQL Server 2005 or later, it replaces a similar function in SQL Server 2000 called ::fn_virtualfilestats(). This column's techniques, code, and principles will be specific to SQL Server 2005, but you can easily apply them to SQL Server 2000 with minor modifications. For more on the latter application, join the discussion about this article in the SQL Server Magazine performance-tuning forums (http://sqlforums.windowsitpro.com).

DMV Benefits
What can this wonderful DMV do for you? In a nutshell, this DMV lets you see exactly what type of physical I/O your SQL Server system is experiencing when you're reading from or writing to the files on disk. The physical I/O distinction is important, compared with the metrics you might get from other SQL Server functions and utilities, which include or report on only logical I/O. (Logical I/O is when a page is accessed only in memory, or cache, whereas physical I/O accesses the file on disk.) So, remember that if the I/O accesses only the SQL Server cache (hence a logical I/O), the output of this DMV won't reflect that I/O. Therefore, you can determine with certainty how much you're reading and writing to your database files on disk. This DMV also indicates how long SQL Server has taken to complete the reads or writes. And because the DMV tracks that data at the individual file level, you can use the results to get a terrific handle on several aspects of your system, such as how busy one file is compared with another, or which file waits the longest for physical operations to occur.

Let's take a look at the raw DMV output, which Table 1, shows. Most of this data is self-explanatory, and all of it is occasionally useful, but I'll focus on just a few for the purposes of this article. The data about number of bytes read and written provides an accurate look at how much data you're reading or writing to each of the files. The number of reads and writes specifies how many individual requests were made for that time period. This information can tell you two very important statistics about your system.

First, there will be no doubt as to which database files have the most physical disk I/O—both in terms of the quantity of data (bytes read or written) and the frequency of the I/O (number of reads or writes). Lots of data read in bytes, combined with a fewer number of reads, can indicate poorly optimized queries that read too much data and have difficulty keeping all the data in the cache. Lots of write requests indicate a busy system in terms of changes to the database.

Second, together with the I/O stall counters, this raw data can tell you exactly where your disk bottlenecks are, in terms of the physical disk arrays the files reside on. The I/O stall counters give you an indication in milliseconds of how much time elapses during the read or write operation—or the combined total, depending on the column. So, obviously, the file with the most I/O stalls is spending the most time waiting. If you add up the stalls for all the files that reside on a particular physical array, you can see which part of the storage system is waiting the most. But that information alone doesn't necessarily suggest that a given array is the poorest-performing. You also have to consider the amount of data processed and the frequency of the I/O requests for that time period.

Making It Work
Table 2 shows some data I gathered for two files on my test system. One is a transaction log file, and the other is a data file. Note that you must be careful when comparing different types of I/O. Log-file traffic is typically smaller in size but accessed more frequently than data, which is larger but characterized by less frequent I/O. These differences suit the purpose of this discussion perfectly. As you can see, the average wait per I/O for the data file is a little over 1 millisecond more than that of the log file. However, the average megabyte per write on the data file is almost four times the average megabyte per write on the log files. That result merely speaks to the way data is written to disk—usually in a deferred fashion as opposed to the transaction log's write-ahead logging behavior. When comparing one data file with another, or a group of files that reside on one array versus another, it becomes much easier to spot the offenders. Remember that the number of waits or requests is always relative, and a high number doesn't always indicate a problem. If you're writing a ton of data, you'll most likely see high values in these columns. The goal is to keep the averages as small as feasible; there will always be some latency or wait times because of the way storage systems work. Microsoft's "Predeployment I/O Best Practices" white paper (http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx) has some great information about preparing disk subsystems for use. The paper also includes handy guidelines for determining how much latency is acceptable under certain conditions.

Another effective way to utilize this DMV's information is in day-to-day monitoring of your storage subsystem. You can take some measurements under known conditions, then use that data as a baseline. Afterward, from time to time, you can compare current metrics against the baseline to spot trends or even potential problems before they get out of hand. This easy solution takes only a few lines of code.

The first thing you need to do is to capture the information at periodic intervals. The information returned from the DMV is a snapshot in time, with values that are cumulative since the most recent SQL Server restart. (There's currently no way to reset these counters short of the restart, but such a feature would be useful.) The DMV materializes the data as a resultset from an in-memory structure on demand, and the end user can't directly change that data. Because the data is a snapshot in time, you simply need to capture the data at periodic intervals and store it in a table. You can then perform a delta of any two snapshots and see how much physical I/O occurred and how much time passed between those two points in time. To that end, I've created a relatively simple stored procedure, which Listing 1 shows, to capture the values and store them in a table residing in the tempdb database. If you'd rather use another database, you can modify the code accordingly.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.