December 19, 2008 05:19 PM

Inline Scalar Functions

Rating: (0)
SQL Server Magazine
InstantDoc ID #101104

User defined functions give you great benefits in terms of encapsulation and code reusability. Unfortunately though, when you invoke a scalar UDF as part of a query, and pass the function a column from the table as input, the function is invoked separately for each row. This is true even when all the function has is a RETURN clause with a single expression that theoretically could have been inlined in the query. This is how SQL Server always handled scalar UDFs from the moment those were introduced in the product (version 2000) and still does today (version 2008). The result is that a query that uses such a function would typically run significantly slower than a query that embeds the original expression inline instead of invoking the function.

I’ll first demonstrate the problem and then provide an alternative that would allow you to use functions without sacrificing the performance of your queries.

To demonstrate the problem first create the table T1 and populate it with 1,000,000 rows by running the following code:

-- Create and populate T1 with 1,000,000 rows

SET NOCOUNT ON;

IF OBJECT_ID('dbo.T1') IS NOT NULL DROP TABLE T1;

GO

 

WITH

  L0 AS (SELECT 0 AS c UNION ALL SELECT 0),

  L1 AS (SELECT 0 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2 AS (SELECT 0 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3 AS (SELECT 0 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4 AS (SELECT 0 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5 AS (SELECT 0 AS c FROM L4 AS A CROSS JOIN L4 AS B),

  Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5)

SELECT n INTO dbo.T1 FROM Nums WHERE n <= 1000000;

GO

 

Next, create the scalar UDF AddOne by running the following code:

-- Create scalar function AddOne 

IF OBJECT_ID('dbo.AddOne') IS NOT NULL DROP FUNCTION dbo.AddOne;

GO

CREATE FUNCTION dbo.AddOne(@n AS BIGINT) RETURNS BIGINT

AS

BEGIN

  RETURN @n + 1;

END

GO

 

This particular function simply adds 1 to the input value, but the idea of course is to use an example for a calculation that can be expressed as a single expression. Just the same the function could have been one that calculates the number of working days between two dates, the last month day corresponding to the input value, or any other single-expression calculation.

Now run the following query which invokes the function:

-- Query 1

SELECT TOP (1) n, dbo.AddOne(n) AS r

FROM dbo.T1

ORDER BY r;

 

The purpose of the TOP (1) and the ORDER BY here is just to prevent the query from returning all 1,000,000 rows in the output. It doesn’t really add much to the cost of the query. It is treated pretty much similar to a MAX aggregate since the query is asking for one top row. Observe the execution plan of the query:

|--Sort(TOP 1, ORDER BY:([Expr1004] ASC))

  |--Compute Scalar(DEFINE:(

     [Expr1004]=[tempdb].[dbo].[AddOne]([tempdb].[dbo].[T1].[n])))

    |--Table Scan(OBJECT:([tempdb].[dbo].[T1]))

 

Notice in the Compute Scalar operator that the expression in the function wasn’t inlined in the query, rather the function was in fact invoked separately for each row. Here are the measures I got from STATISTICS TIME for this query:

SQL Server Execution Times:

   CPU time = 9266 ms,  elapsed time = 14206 ms.

 

It took 14 seconds for the query to complete on my system, and it consumed quite significant CPU time! Compare this to embedding the original expression inline in the query:

-- Query 2

SELECT TOP (1) n, n + 1 AS r

FROM dbo.T1

ORDER BY r;

 

|--Top(TOP EXPRESSION:((1)))

  |--Compute Scalar(DEFINE:([Expr1004]=[tempdb].[dbo].[T1].[n]+(1)))

    |--Parallelism(Gather Streams, ORDER BY:([tempdb].[dbo].[T1].[n] ASC))

      |--Sort(TOP 1, ORDER BY:([tempdb].[dbo].[T1].[n] ASC))

        |--Table Scan(OBJECT:([tempdb].[dbo].[T1]))

 

 SQL Server Execution Times:

   CPU time = 764 ms,  elapsed time = 688 ms.

 

And as you can see, the run time dropped to under a second. Does this mean that you are doomed to having to choose between the benefits of UDFs and performance, and not being able to enjoy both? Not necessarily…

SQL Server does support inline table-valued UDFs. Those are truly inlined in the outer query much like any other table expression (derived table, CTE, view) is. That is, when querying such a UDF, SQL Server internally expands the UDF’s definition, and rearranges the query such that it directly accesses the underlying objects. With this in mind, consider the following alternative…

Instead of creating a scalar UDF with a RETURN clause that looks like this:

RETURN <expression>

Create an inline table-valued UDF with a RETURN clause that looks like this:

RETURN SELECT <expression> AS val

And then in the query, instead of the expression:

dbo.MyFunction(col1) AS result

Use:

(SELECT val FROM dbo.MyFunction(col1)) AS result

For example, to implement an alternative to our AddOne function, create the following AddOneInline function:

-- Create inline function AddOneInline

IF OBJECT_ID('dbo.AddOneInline') IS NOT NULL DROP FUNCTION dbo.AddOneInline;

GO

CREATE FUNCTION dbo.AddOneInline(@n AS BIGINT) RETURNS TABLE

AS

RETURN SELECT @n + 1 AS val;

GO

 

To invoke the UDF in your query, use the following form:

-- Query 3

SELECT TOP (1) n, (SELECT val FROM dbo.AddOneInline(n)) AS r

FROM dbo.T1

ORDER BY r;

 

SQL Server will optimize this query the same it does Query 2:

|--Top(TOP EXPRESSION:((1)))

  |--Compute Scalar(DEFINE:([Expr1006]=[tempdb].[dbo].[T1].[n]+(1)))

    |--Parallelism(Gather Streams, ORDER BY:([tempdb].[dbo].[T1].[n] ASC))

      |--Sort(TOP 1, ORDER BY:([tempdb].[dbo].[T1].[n] ASC))

        |--Table Scan(OBJECT:([tempdb].[dbo].[T1]))

 

 SQL Server Execution Times:

   CPU time = 671 ms,  elapsed time = 714 ms.

 

And as a result, this query also ran under a second on my system.

Cheers,

BG

Add a Comment

oops...formatting is offin the previous comment, please try this:

if object_id('dbo.StripAllTags') is not null drop function dbo.StripAllTags
go
CREATE FUNCTION [dbo].[StripAllTags] (@HTMLText VARCHAR(MAX)) RETURNS VARCHAR(MAX)
/*******************************************************************************************
* This function replaces "<" and ">" tags. Example , will be replaced as "This, That"
* ********************************************************************************************/
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
SET @HTMLText = REPLACE(@HTMLText,'>','>')
SET @HTMLText = REPLACE(@HTMLText,'<','<')
RETURN LTRIM(RTRIM(@HTMLText))
END
GO

Thanks,
Ranga

Ranga11/16/2009 10:35:20 AM


Very Good article...What if the function is slightly complex like this, can it be written as a inline function ?

if object_id('dbo.StripAllTags') is not null
drop function dbo.StripAllTags
go
CREATE FUNCTION [dbo].[StripAllTags]
(@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
/*******************************************************************************************
* This function replaces "<" and ">" tags. Example , will be replaced as "This, That"
*
********************************************************************************************/
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
SET @HTMLText = REPLACE(@HTMLText,'>','>')
SET @HTMLText = REPLACE(@HTMLText,'<','<')
RETURN LTRIM(RTRIM(@HTMLText))
END
GO

Thanks,
Ranga

Ranga11/16/2009 10:31:17 AM


"Hi Brian,

The function is invoked separately for each row of the table you are querying. As for telling how many times the function was invoked, this can be achieved with a simple trace. Capture the SP:Completed (or SP:Starting) event. To limit the size of the trace, you might want to test the query against a small number of rows—for example, with the filter n <= 10. Following is a screen capture of Profiler showing the events I got when I tracing the query that invokes the UDF: http://www.windowsitpro.com/Common/adforceimages/Inline Scalar Functions comment fig1.jpg.

Cheers,

Itzik"

Posted by Megan Keller

Associate Editor, SQL Server Magazine

mkeller@sqlmag.com

Megan9/1/2009 1:15:55 PM


I want to understand something better. The article says: "...Unfortunately though, when you invoke a scalar UDF as part of a query, and pass the function a column from the table as input, the function is invoked separately for each row. "
My question is for each row of what? Of the table you're querying? Of the table that the UDF queries? Of the resultset? And how can I see exactly how many times the function was actually called during the query processing?

Thanks,
Brian

Brian8/26/2009 12:01:18 PM


Excellent post, and wonderfully simple solution! Thanks!

Stephen12/24/2008 11:25:39 AM


You must log on before posting a comment.

Are you a new visitor? Register Here

Rename Virtual Server (Cluster)

I am going to rename a clustered instance (default).1. Change SQL Network Name2. Take offline3. Bring online4. Flush DNS, Cache5. Test failover6. Rebu...222-96209

GOOGLE LINKS
SPONSORED LINKS
FEATURED LINKS