• subscribe
October 30, 2008 12:00 AM

Finding Your Top 10 SQL Server Queries

Take a little effort to identify your worst-performing queries
SQL Server Pro
InstantDoc ID #100121
Downloads
100121.zip

Executive Summary:

If you need to parse Microsoft SQL Server server-side trace information so that you can aggregate the data to find your worst-performing and most-called queries, you might try turning to the T-SQL user-defined function called SQL_Signature.

You’re already capturing SQL trace information through the use of server-side traces, which send results directly to files on a locally attached disk on the server. Server-side traces not only minimize server load during the tracing process but also give you an easy way to parse and analyze the information later. By copying the files to another machine for processing, you can negate any impact on the production server while you analyze the data.

But what if you need to parse the trace information in a way that will let you aggregate the data to find your worst-performing and most-called queries, stored procedures, and so on? One simple solution is to use a T-SQL user-defined function called SQL_ Signature (popularized by Microsoft’s Tom Davidson). There are more efficient but slightly more complicated methods for tackling this problem—I’ll follow up in the SQL Server Magazine Performance Tuning and Optimization forum (sqlforums.windowsitpro.com/web/forum/) with a few—but this solution should get you started.

Replacing Information
In “Generating Server-Side Traces”, I established that you can use the fn_ trace_gettable() function to read events and columns directly from the trace files. This function will be your primary means of retrieving trace data for direct processing (or placing it in a table for later processing).

SELECT * FROM ::FN_TRACE_GETTABLE('C:  YourTraceFile.trc', DEFAULT)

The goal of the SQL_Signature function, which Listing 1, shows, is to replace certain pieces of information (which might vary between calls to the database) with a constant—in this case, the # symbol. These pieces of information are typically the values in a search argument of the WHERE clause that vary from one call to the next, while the rest of the statement remains constant.

Replacing these values will let you get a core list of what I call Query Classes and will let you aggregate the data to get meaningful metrics. In the following example, the three statements are effectively the same except for the CustomerID value. If you tried to aggregate these with a simple GROUP BY clause, you would get three separate groups.

SELECT * FROM [dbo].[OrderHeader] WHERE
  CustomerID = 397
SELECT * FROM [dbo].[OrderHeader] WHERE
  CustomerID = 398
SELECT * FROM [dbo].[OrderHeader] WHERE
  CustomerID = 398

However, if you replace the IDs with a constant, you can group these together to get more meaningful metrics.

SELECT * FROM [dbo].[OrderHeader] WHERE
  CustomerID = #

The same is true for stored procedure calls. For example,

EXECUTE [dbo].[cp_test] 70
EXECUTE [dbo].[cp_test] 71
EXECUTE [dbo].[cp_test] 72
becomes
EXECUTE [dbo].[cp_test] #

To use the function, you’d simply pass in the text you want to parse, along with a second parameter to tell it how many characters from the beginning of the statement to use during the parsing.

SELECT dbo.SQL_Signature('select * from
  orders where orderid = 10255 AND MyDate

  = '20060101',1000)

You can also use the function directly when reading the data from the trace file; to do it properly for many rows, you need to place the data into a table. You can use the following code to create the table, insert the data, and parse the TextData column:

SELECT [EventClass],\[TextData],\[DatabaseID],
[Duration],\[Reads],\[Writes],\[CPU],
dbo.SQL_Signature([TextData],1000) AS
  [SQLSignature],
CAST(0 as INT) AS [HashCode]
INTO YourTable
FROM ::FN_TRACE_GETTABLE('C:\YourTraceFile.
  trc', DEFAULT)

Next, to make the aggregations easier, compute a hashed value of the parsed text for each row. You might want to add an index on HashCode after the update:

UPDATE YourTable WITH (TABLOCK)
  SET [HashCode] =
  CHECKSUM([SQLSignature])

Now, you can aggregate the data:

SELECT [EventClass],SUM([Duration]) AS
  [Total Duration],
SUM([Reads]) AS [Total Reads],
  SUM([Writes]) AS [Total Writes],
SUM([CPU]) AS [Total CPU], COUNT(*) AS
  [Total Counts],
[HashCode], CAST(' AS NVARCHAR(4000))
  AS [SQLSignature],
CAST(' AS NVARCHAR(4000)) AS [Sample_
  Statement]
INTO AggTable
FROM YourTable GROUP BY
  [EventClass],\[HashCode]

Finally, add the query class (SQLSignature) and a sample statement for each class, and you’re ready to go.

UPDATE AggTable SET [SQLSignature] =
(SELECT TOP 1 a.[SQLSignature] FROM
  YourTable AS a WHERE a.[HashCode] =
AggTable.[HashCode]),
[Sample_Statement] =
(SELECT TOP 1 a.[TextData] FROM
  YourTable AS a WHERE a.[HashCode] =
  AggTable.[HashCode])

LEARNING PATH
SQL SERVER MAGAZINE RESOURCES

“Generating Server-Side Traces,” InstantDoc ID 99940

“SQL Server Profiler or Server-Side Trace?” InstantDoc ID 99365

“Dissecting SQL Server’s Top Waits,” InstantDoc ID 98112

“Are Your SQL Server Statements Performing Well?” InstantDoc ID 97761

“9 Steps to an Automated Trace,” InstantDoc ID 43014

Your Call
You can easily add further metrics (e.g., averages, max) to this simplified example as you see fit. Now, to find your top worstperforming queries, you can select the aggregated results in any order you want— by Reads, Writes, CPU, or Duration. Also, take into account the number of times the query class was run. For example, if you execute a query once a day, and it has 1,000,000 reads, is that considered one of the top 10 queries you should be concerned with? It isn’t if you have other queries that use 5,000 reads each but are called once per second. Slice and dice it as you want, but if you tackle the top 10 offenders in your system, you’ll probably tackle 90 percent of the overall problems and get the biggest bang for your effort.



ARTICLE TOOLS

Comments
  • Marcos
    3 years ago
    May 25, 2009

    Couldn't agree more. Chase the 10 offenders and your database will get a lot better. I already do this with DMVs and I should say that it is the second most important part of my job as DBA. The first one is to secure the data.

  • Andrew
    4 years ago
    Nov 13, 2008

    Yes it is amazing how much performance you can gain by addressing just a relatively few number of queries in most systems. The query_hash column in 2008 will be of great help to people running SQL 2008 and I was thinking of writing about that in a future article as well.

  • Michael
    4 years ago
    Nov 04, 2008

    I like to use the DMVs to get a lot of this information. Particularly the sys.dm_exec_query_stats. And in SQL Server 2008, the view has a new column called query_hash. This column is similar to the SQL_Signature defined above, but in a hashed format so it's useful for grouping when aggregating.

    I'm also a big fan of getting the biggest bang for your effort and so I totally agree with the tip on tackling the top 10 offenders in order to reduce 90 percent of the problems. I've also heard this as the 20-80 rule and talk about it quite a bit at
    http://dbwhisperer.blogspot.com/2008/08/80-20-rule-and-sql-server-performance.html
    and
    http://dbwhisperer.blogspot.com/2008/10/reporting-services-is-fun-again.html

You must log on before posting a comment.

Are you a new visitor? Register Here