• subscribe
July 20, 2004 12:00 AM

Anatomy of a Performance Solution

Real-world experience shows the benefit of internals knowledge
SQL Server Pro
InstantDoc ID #43066
Downloads
43066.zip

I get a lot of nice feedback about my columns and classes, but some people ask me why they should bother learning how SQL Server works. I study internals often just for a deeper understanding of my favorite software and for the experience of learning something new. But many of my readers and students have specific reasons for learning about SQL Server internals. One of the most common reasons is that once you understand the way SQL Server works, you're better prepared to track down problems. If you understand the way indexes work, for example, you know that a query that takes 5 seconds to return 1 row, even from a 10 million­row table, can probably be improved.

Recently, I saw a real-world example of how newly acquired knowledge of SQL Server internals helped a DBA solve a puzzling problem. In a class I taught about SQL Server internals and performance tuning, one student was extremely motivated. Instead of constantly picking my brain and asking me questions, as many students do, Danny tried to duplicate every demonstration to make sure he could produce the same behavior and get the same results I did. He asked questions only when he wanted to delve even deeper into a topic.

Danny installed software on the class computer that let him access his machines at work so that he could test the concepts he was learning on his own databases. Toward the end of the second day of class, he mentioned that he was having a problem with one of his SQL Servers at work and was trying to track down the cause. The main symptom was that when he clicked the plus sign (+) in Enterprise Manager to expand the database list, this machine took more than 15 minutes to enumerate the list of about 40 databases. The expansion didn't take nearly that long on the other SQL Server machines, so he felt confident that the underlying problem was solvable.

I first suggested that he determine which commands Enterprise Manager was sending to SQL Server when expanding the database list. We had talked about SQL Server Profiler briefly, so he could set up a simple trace. (For more information about using Profiler traces to improve performance, see Steven Berringer's article "9 Steps to an Automated Trace," InstantDoc ID 43014.) I recommended that he capture only the SQL:Batch Completed events in the TSQL event category. I further recommended that he filter the captured data on application name by specifying "MS SQLEM" as the value in the LIKE clause for the ApplicationName filter to capture events only from Enterprise Manager.

When he started the trace, he saw only the T-SQL batches sent to SQL Server through Enterprise Manager, just as he expected. He then opened Enterprise Manager and clicked the plus sign to expand his databases. The Profiler display showed the code in Listing 1. This code looked fine for Enterprise Manager to use to obtain information about every database, but Profiler was displaying this query as many times as there were databases. This repetition might not seem like a problem at first glance—after all, you need the properties for each database—but the SELECT statement has no WHERE clause. One execution of this statement retrieves the information for all databases, so Enterprise Manager doesn't need to send the same statement multiple times. This redundancy appeared to be a bug in Enterprise Manager, so we couldn't easily fix it.

However, Danny reminded me that he experienced much longer delays on one machine than on any others. I assumed that he meant one client machine, so I started researching how the network connectivity could be different between different clients and suggested that he investigate the protocol configuration for each client.

Meanwhile, Danny copied the code from the Profiler display into a Query Analyzer window, still connected to his at-work system. When he executed the statement, it took more than 30 seconds to run. Because he thought that a simple system-table access shouldn't be that slow, he started paring down the function calls from the SELECT list. He discovered that the last function in the list, has_dbaccess(name), was the culprit. When he removed that function from the SELECT list, the query completed almost immediately.

The has_dbaccess() function takes one parameter—the database name—and returns 1 if the user executing the function has access to the database, 0 if the user doesn't have access, and NULL if the database name is invalid. This check keeps Enterprise Manager from displaying names of databases that you don't have access to. Danny wanted to determine whether this function was just resource-intensive or whether some databases were slower to access than others. By running the following SELECT statement for each database, he could check the time required to access each database individually:

SELECT has_dbaccess('<database_name>')

He ran this code for each database, plugging in one database name at a time, and noticed that several databases took many seconds to return results but others returned results immediately. Meanwhile, a classmate looked through Microsoft's Knowledge Base and found the article "FIX: Opening the Database Folder in SQL Server Enterprise Manager 2000 Takes a Long Time" (http://support.microsoft.com/default.aspx?scid=kb;en-us;282416&Product=sql). The title certainly seemed to reflect Danny's problem, but the article said that SQL Server 2000 Service Pack 1 (SP1) fixed this problem. Danny checked his version number and found that his problem server was running SP3. Because SP1 fixed the client's SQL-DMO library, he checked his client machines and found that they were still running the RTM version of SQL-DMO.

I then traced the actions SQL Server took when expanding the list of databases on one of the class machines, which had SP3 installed for all components, and didn't see SQL Server running the SELECT query once for each database. SP1 apparently fixed that problem, but the SELECT query in Listing 1 still took several seconds to run even once on the problem machine at Danny's office.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here