Editor note: Yaniv Etrogi has created a script to deploy the sqlsig CLR function. It is linked above as 102445sqlsig.zip.
I implemented the TraceErrors process to gain more control over the production servers my team is responsible for. Knowing about every error that occurs in SQL Server can make a difference. In a perfect world, all exceptions would be logged, reported, and emailed to the relevant development team, but we all know that things don’t work like this.
In reality, not all exceptions are caught or reported. One common scenario is exceptions being cached but not logged, known as a swallowed exception. When things go wrong, the DBA is expected to explain what went wrong, even though in many cases the problem is caused by the application being serviced by the database and not by the database itself.
With the TraceErrors process, I receive a daily email with an HTML report of all exceptions that occurred in the database engine in the last 24 hours. This lets my team be proactive in many cases, approaching the relevant person or group responsible for the module generating an error, saving time and speeding resolution.
In addition to the daily report mail, having the traces always up and running has been proved to be very useful, especially if you don't have a third-party analyzer product that tracks and saves all database activity so you can search it. I'm not suggesting that you don't need an analyzer, but in environments where there aren't any such tools, TraceErrors has greater benefits.
With the traces always running, you can immediately go to the last trace file and load it to a table if any problem occurs, and you can then query this table. You could also double-click the trace file to open SQL Server Profiler and automatically load the file to view the data in the Profiler UI and use Ctrl+F to search and investigate the data. The Microsoft article "Using SQL Server Profiler" has more information about analyzing trace data.
I limit my trace files to 50MB, a size that I've found convenient. The timeframe that a 50MB file can cover will depend on the workload your server handles, so you can increase the limit to fit your workload.
The load operation consumes I/O and CPU resources, so it's best to schedule it for off hours. If you don't have off hours, be sure to schedule at off peak hours or copy the trace files to another server and perform the load there. Just to give you an idea, it takes me 14 minutes to load and process 50 50MB files containing a total of 9 million events on a two-year-old server with four CPUs, 8GB RAM, and local SCSI 15,000 RPM disks configured in RAID-1. It takes only six minutes on a new server with four CPUs, 16GB RAM, a fast SAN to load and process trace files with the same size that contain 24 million events.
Regardless of the time that your trace data covers, you control the period of time covered by the report using the @Hours input parameter passed to the TraceErrorsLoad stored procedure, which processes the raw traced data. I use 24 hours for @Hours, because the report is generated once daily. Setting @Hours to 24 ensures that no data older than 24 hours is worked on.
You can download the code for TraceErrors by clicking the link above.. To implement TraceErrors, first create a database named PerfDB. Create the four stored procedures included in the .zip file in the PerfDB database. Then create the TraceErrorsLoad job, which executes the TraceErrorsLoad stored procedure, and the TraceErrorsStatus job, which executes the TraceErrorsStatus stored procedure.
Process Description
TraceErrors is made up of four stored procedures. Figure 1 gives a brief description of the procedures. TraceErrorsCreate creates a trace file and sets its status to started. The procedure uses SQL Trace’s stored procedures. First it uses sp_trace_create to create a trace definition in a stopped state. Next, it adds (subscribes to) events and columns using sp_trace_setevent. Following that, sp_trace_setfilter is used to filter out unwanted data. Finally, it calls sp_trace_setstatus to change the trace status to started.
Figure 1: Procedures used by TraceErrors- TraceErrorsCreate—Defines the trace and starts it.
- TraceErrorsLoad—Loads the trace files into a table and manipulates the data, making it ready for retrieval.
- TraceErrorsStatus—Verifies that the trace is running and active.
- TraceErrorsReport—Retrieves the final results and returns a table that can be mailed by the calling application.
The stored procedure TraceErrorsLoad reads the captured data from the trace files, loads it to a database table, and manipulates the data, making it ready and available for retrieval. The stored procedure TraceErrorsReport simply retrieves the data and displays it to the calling application. The stored procedure TraceErrorsStatus assures that the trace is always up and running.