| Executive Summary:
If you need to parse Microsoft SQL Server server-side trace information so that you can aggregate the data to find your worst-performing and most-called queries, you might try turning to the T-SQL user-defined function called SQL_Signature. |
You’re already capturing SQL trace information
through the use of server-side traces, which send
results directly to files on a locally attached disk on the
server. Server-side traces not only minimize server load
during the tracing process but also give you an easy way
to parse and analyze the information later. By copying
the files to another machine for processing, you can
negate any impact on the production server while you
analyze the data.
But what if you need to parse the trace information
in a way that will let you aggregate the data to
find your worst-performing and most-called queries,
stored procedures, and so on? One simple solution is
to use a T-SQL user-defined function called SQL_
Signature (popularized by Microsoft’s Tom
Davidson). There are more efficient but slightly
more complicated methods for tackling this
problem—I’ll follow up in the SQL Server Magazine Performance
Tuning and Optimization forum (sqlforums.windowsitpro.com/web/forum/) with a few—but this
solution should get you started.
Replacing Information
In “Generating Server-Side Traces”, I established that you can use the fn_
trace_gettable() function to read events and columns directly from the trace files. This function will be your
primary means of retrieving trace data for direct processing
(or placing it in a table for later processing).
SELECT * FROM ::FN_TRACE_GETTABLE('C: YourTraceFile.trc', DEFAULT)
The goal of the SQL_Signature function, which
Listing 1, shows, is to replace certain pieces
of information (which might vary between calls to the
database) with a constant—in this case, the # symbol.
These pieces of information are typically the values in
a search argument of the WHERE clause that vary
from one call to the next, while the rest of the statement
remains constant.
Replacing these values will let you get a core list of
what I call Query Classes and will let you aggregate
the data to get meaningful metrics. In the following
example, the three statements are effectively the same
except for the CustomerID value. If you tried to aggregate
these with a simple GROUP BY clause, you would
get three separate groups.
SELECT * FROM [dbo].[OrderHeader] WHERE
CustomerID = 397
SELECT * FROM [dbo].[OrderHeader] WHERE
CustomerID = 398
SELECT * FROM [dbo].[OrderHeader] WHERE
CustomerID = 398
However, if you replace the IDs with a constant, you can
group these together to get more meaningful metrics.
SELECT * FROM [dbo].[OrderHeader] WHERE
CustomerID = #
The same is true for stored procedure calls. For
example,
EXECUTE [dbo].[cp_test] 70
EXECUTE [dbo].[cp_test] 71
EXECUTE [dbo].[cp_test] 72
becomes
EXECUTE [dbo].[cp_test] #
To use the function, you’d simply pass in the text
you want to parse, along with a second parameter to tell it how many characters from the beginning of the
statement to use during the parsing.
SELECT dbo.SQL_Signature('select * from
orders where orderid = 10255 AND MyDate
= '20060101',1000)
You can also use the function directly when reading the
data from the trace file; to do it properly for many rows,
you need to place the data into a table. You can use the
following code to create the table, insert the data, and
parse the TextData column:
SELECT [EventClass],\[TextData],\[DatabaseID],
[Duration],\[Reads],\[Writes],\[CPU],
dbo.SQL_Signature([TextData],1000) AS
[SQLSignature],
CAST(0 as INT) AS [HashCode]
INTO YourTable
FROM ::FN_TRACE_GETTABLE('C:\YourTraceFile.
trc', DEFAULT)
Next, to make the aggregations easier, compute a
hashed value of the parsed text for each row. You
might want to add an index on HashCode after the
update:
UPDATE YourTable WITH (TABLOCK)
SET [HashCode] =
CHECKSUM([SQLSignature])
Now, you can aggregate the data:
SELECT [EventClass],SUM([Duration]) AS
[Total Duration],
SUM([Reads]) AS [Total Reads],
SUM([Writes]) AS [Total Writes],
SUM([CPU]) AS [Total CPU], COUNT(*) AS
[Total Counts],
[HashCode], CAST(' AS NVARCHAR(4000))
AS [SQLSignature],
CAST(' AS NVARCHAR(4000)) AS [Sample_
Statement]
INTO AggTable
FROM YourTable GROUP BY
[EventClass],\[HashCode]
Finally, add the query class (SQLSignature) and
a sample statement for each class, and you’re
ready to go.
UPDATE AggTable SET [SQLSignature] =
(SELECT TOP 1 a.[SQLSignature] FROM
YourTable AS a WHERE a.[HashCode] =
AggTable.[HashCode]),
[Sample_Statement] =
(SELECT TOP 1 a.[TextData] FROM
YourTable AS a WHERE a.[HashCode] =
AggTable.[HashCode])
LEARNING PATH SQL SERVER MAGAZINE RESOURCES
“Generating Server-Side Traces,”
InstantDoc ID 99940
“SQL Server Profiler or Server-Side Trace?”
InstantDoc ID 99365
“Dissecting SQL Server’s Top Waits,”
InstantDoc ID 98112
“Are Your SQL Server Statements Performing Well?”
InstantDoc ID 97761
“9 Steps to an Automated Trace,”
InstantDoc ID 43014 |
Your Call
You can easily add
further metrics (e.g.,
averages, max) to this
simplified example as
you see fit. Now, to
find your top worstperforming
queries,
you can select the
aggregated results in
any order you want—
by Reads, Writes, CPU,
or Duration. Also,
take into account the
number of times the
query class was run.
For example, if you execute a query once a day, and it
has 1,000,000 reads, is that considered one of the top
10 queries you should be concerned with? It isn’t if you
have other queries that use 5,000 reads each but are
called once per second. Slice and dice it as you want,
but if you tackle the top 10 offenders in your system,
you’ll probably tackle 90 percent of the overall problems
and get the biggest bang for your effort.