• subscribe
January 19, 2012 03:37 PM

Small Code Changes, Big SQL Server Performance Improvements

SQL Server Pro
InstantDoc ID #141959
In my five-day SQL Server Internals and Tuning class, I have a module called simply "Query Tuning." When we start this module, I tell the students that if the SQL Server optimizer were perfect, we wouldn’t need this module. If the optimizer were perfect, it could take any legal T-SQL query you submit and figure out the best way to execute it to return the results you’re asking for. Unfortunately, the optimizer isn’t perfect (yet). However, there are still some things you can do in your T-SQL code to help the optimizer come up with a great execution plan.

One of the topics discussed in this module is query hints, which I talked about in my commentary "Take a Hint About Query Hints." There are several other topics discussed that deal with constructs in your code that can have a negative impact on performance. There are simple changes you can made to remove these troublesome constructs that can give immediate performance improvements.

One of these constructs is scalar-valued user-defined functions (UDFs). UDFs were added to SQL Server 2000 and were a wonderful addition to the product. Microsoft had been telling us since SQL Server 6 that UDFs were coming "really soon," and they finally appeared in SQL Server 2000. UDFs come in two varieties: scalar functions, which return a single value, and table valued functions, which return a set of result rows and can be referenced in a FROM clause. Even after waiting all these years, it turned out that scalar UDFs didn’t live up to their promise.

I discovered some bad behavior shortly after SQL Server 2000 came out when I was contracted to do some performance tuning and troubleshooting for an insurance company in Atlanta. One of the developers had written a scalar UDF to mask social security numbers in a report they needed to run frequently. The function took a social security number as a character string input value, replaced some of the digits with special characters, and returned the masked value as output.

When we ran the report with only about 10,000 customer rows, it took more than 20 minutes to run. Something wasn’t right. I started a trace to see what statements in the report were taking the most time and saw 10,000 separate calls to the masking function. For each row, the trace showed SQL Server calling the function, replacing the string values, and returning the result, just as if we had written code with a cursor to process the data one row at a time. If I replaced the function call with the actual code used to mask the characters, the trace showed a single call to the function and the report finished in under a minute.

For example, here’s a scalar UDF that takes a social security number (nine digits and two hyphens) as input and replaces the first five digits with X’s.
CREATE FUNCTION dbo.fnMaskSSN (@ssn char(11))
RETURNS char(11)
AS BEGIN
SELECT @SSN = 'xxx-xx-' + right(@ssn,4)
RETURN @SSN
END;
To test it, you could use any table that has a social security number column, including the authors table from the old sample database pubs.
SELECT dbo.fnMaskSSN(au_id), au_lname, au_fname
FROM authors;
To improve the performance, you would need to forget about the UDF, and replace the SELECT with the function definition:
SELECT 'xxx-xx-' + right(au_id,4), au_lname, au_fname
FROM authors;

Of course, with 23 rows in the authors table, you wouldn’t see the performance advantage, but if you ran a trace you would see a difference in the amount of work that SQL Server was doing behind the scenes. Although the developer wasn’t happy to have his function ignored, the DBA, who was ultimately responsible for system performance, was delighted.

The poor performance of scalar UDFs occurs only when the function is referenced in the SELECT list. In those cases, you can replace the function reference with the code that defines the function and potentially make an enormous difference in performance. Microsoft is well aware of this problem, but it has no immediate plans to change the way scalar UDFs work when used in a SELECT list. My tests have shown the same poor performance in my installation of SQL Server 2012 RC0.

Query tuning doesn’t always require detailed knowledge of everything SQL Server is doing and the way it works internally. Sometimes it requires only that you have tools that let you see where problems are occurring, and then use a little creative energy to come up with a different way to write the problematic queries to get the same results. In this case, replacing the call to the function with the underlying function code was a very minor change that had dramatic results.


ARTICLE TOOLS

Comments
  • Dydek
    3 months ago
    Feb 17, 2012

    I totally agree with your findings.. We have've recently eliminated a few UDFs created by an ORACLE-to-SQL migration tool, a datetime-to-date truncation function, and time to execute our queries went down by a whooping 99%. The optimization effect is greater when UDFs is used in a WHERE clause, because it automatically forces cursor processing. There are few articles floating on the web, confirming that most of the time UDFs are bad, but not all of the time the difference is that significant. Here are my 3 favorite ones:

    http://allinthehead.com/retro/247/user-defined-functions-considered-harmful
    http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/04/21/not-all-udfs-are-bad-for-performance.aspx
    http://www.simple-talk.com/sql/t-sql-programming/clr-performance-testing/

  • KDelaney
    4 months ago
    Jan 25, 2012

    Typos:
    far working = far more work
    sclar = scalar

  • KDelaney
    4 months ago
    Jan 24, 2012

    Maurice... thanks for your input. Yes, I tried to stress that the problem was only with scalar UDFs. I don't have enough words in these short commentaries to cover all the possible solutions, but I definitely stress in my class that one solution is to rewrite using a table-valued function.

    KBerghall... thanks for your comments. I first ran into this problem long ago, so the actual numbers are just what I remember. The report definitely did more than just the masking of the social security numbers, but I do remember that all we did was remove the call to the function and replace it with the function code and we made a drastic improvement. I would NOT expect any test on the current version with today's hardware to show the exact behavior we saw back then, but as mirzarashidbai attests, the problem does still exist. My tracing also shows far working going on behind the scenes if you have a call to a sclar UDF in your select list. If you are not having problems with scalar UDFs in the SELECT list, then that is great, but I offer this as one area where some people can have a major impact on performance. I am not saying UDFs are always bad, either, only that if you are trying to tune your code, you can try removing scalar functions in the SELECT list.

    Thanks again for the all the feedback!!
    ~Kalen

  • kberghall
    4 months ago
    Jan 20, 2012

    It makes very little difference in this case, but here is how I would simplify the example UDF.

    CREATE FUNCTION dbo.fnMaskSSN (@ssn char(11))
    RETURNS char(11)
    AS BEGIN
    RETURN 'xxx-xx-' + right(@ssn,4)
    END;

  • kberghall
    4 months ago
    Jan 20, 2012

    We have used UDFs extensively since they became available. There are definitely some performance traps if you use them the wrong way, but generally we have been very pleased with the performance and it is an extremely convenient way to encapsulate and re-use code.

    I tried you little SSN example and ran it on a table with 800,000 records and it ran thru all in 5 seconds. This is on a SQL Server 2008 R2 on a relatively slow server. So I think the query with 10,000 records must have had some other type of performance issue if it took 20 min.

    Also we have noticed that UDFs generally perform better if simplified down to one RETURN statement instead of declaring variables, etc. especially when accessing tables within the UDF.

You must log on before posting a comment.

Are you a new visitor? Register Here