DOWNLOAD THE CODE:
Download the Code 42747.zip

T-SQL programmers often use user-defined functions (UDFs) to encapsulate algorithm implementations. However, several aspects of UDFs constrain programmers' capabilities. In "UDF Back Doors" (April 2004, InstantDoc ID 41845), I discussed some UDF limitations and some back doors that let you circumvent them. I showed how to create scalar and table-valued special UDFs and how to modify data through UDFs. This month, I cover another aspect of UDFs—related to their atomicity—that limits programmers' capabilities and a couple of back doors that circumvent this limitation. I have to stress here that I don't recommend using back doors; rather, I want to highlight the limitations of UDFs that cause people to use the back doors. I also discuss a supported UDF feature that most people simply don't know about: modifying data through inline UDFs.

Function Atomicity
When you use nondeterministic T-SQL functions (functions that can return different results in different invocations even when they have the same input) such as RAND() and GETDATE() in your queries, SQL Server invokes those functions only once per query, not once per row. Therefore, a query such as

USE Northwind

SELECT RAND() AS rnd, GETDATE() AS dt, * FROM Orders

returns the same values in the rnd and dt result columns for all rows. If you're not aware of this behavior, some of your query results might surprise you. For example, if you want to return a random row from the Orders table, you might try to run the following query:

SELECT TOP 1 * FROM Orders ORDER BY RAND()

Specifying an expression in the ORDER BY clause is logically similar to adding it to the SELECT list as a result column, then instead of returning the result in the output, just sorting the rows by that column. By specifying ORDER BY RAND(), you expect SQL Server to invoke the RAND() function for each row and to sort the rows by the various random values SQL Server generates for the different rows. But because SQL Server invokes RAND() only once for the query, all rows get the same value (a float in the range 0 to 1), so the value doesn't affect the order of the result rows and you don't get a different random row every time you invoke the query.

SQL Server also invokes a nondeterministic function only once per query when you have an INSERT SELECT statement that invokes the GETDATE() function. Suppose you try to insert the results of a query into a table, using the GETDATE() function in the SELECT list to store the date and time SQL Server inserted the row. SQL Server assigns all the rows the same result value of the single GETDATE() invocation; therefore, the value represents not the time that SQL Server inserted each row but rather the time that SQL Server invoked the function.

The atomicity-of-functions rule has one exception. The NEWID() function, which returns a globally unique identifier (GUID), must return a different value with each invocation. This is the only nondeterministic function that SQL Server invokes once per row instead of once per query. For example, running the following query gives you a different value in the GUID column for each row:

SELECT NEWID() AS GUID, 
* FROM Orders

NEWID() returns a fairly random value if you run it on Windows 2000 and later releases. On earlier releases, the Media Access Control (MAC) address of the network card appears unscrambled in the GUID that NEWID() generates, so the values you get aren't random. On these releases, you need to use the following query to get a random row with each invocation:

SELECT TOP 1 * FROM Orders ORDER BY NEWID()

You might try to circumvent SQL Server's atomicity-of-functions limitation by creating a function that returns the result of the RAND() or GETDATE() functions:

CREATE FUNCTION dbo.fn_perrow_rand()

RETURNS float
AS
BEGIN
   RETURN RAND()

END

But such an attempt fails, generating the following error message:

Server: Msg 443, Level 16, State 1, Procedure 
fn_perrow_rand, Line 4


Invalid use of 'rand' within a function.

This use of the RAND() function is invalid because you can't invoke nondeterministic functions from within UDFs.

However, you can circumvent this limitation. A couple of undocumented and unsupported back doors let you create UDFs that invoke nondeterministic functions and operate separately for each row within a query. The first UDF uses the OPENQUERY() function similarly to the way I used it in April's examples. Remember that SQL Server doesn't parse the string a query submits to a linked server, so you can issue any query you want as long as it returns a table. The OPENQUERY() function appears in a query's FROM clause, so it has to return a table. If you refer to your own server as the linked server, your server submits the query to itself. Along those lines, you can implement the fn_perrow_rand() function by referring to the RAND() function through OPENQUERY() instead of directly:

CREATE FUNCTION dbo.fn_perrow_rand() RETURNS  
  float
AS
BEGIN
  RETURN (SELECT rnd FROM OPENQUERY([server_name], 'SELECT RAND() AS rnd'))
END
GO

To test the function and see that it returns a different value for each row, run the following query:

SELECT dbo.fn_perrow_rand() AS rnd, * FROM Orders

However, using OPENQUERY() significantly degrades query performance because you invoke a distributed query with each invocation of the UDF—once for every row.

Another back door—using a view to produce random results—gives much faster performance. First, create a view that returns the result of a nondeterministic function, such as the following VRand view:

CREATE VIEW VRand AS SELECT RAND() AS rnd

GO

You can then create a function that retrieves the rnd column from the view and invoke this function once per row within a query. To test this approach, first create the fn_perrow_rand2() function:

CREATE FUNCTION dbo.fn_perrow_rand2() RETURNS 
  float

  AS BEGIN RETURN (SELECT rnd 
  FROM VRand) END

GO

Then, issue the following query several times:

SELECT TOP 1 * FROM Orders ORDER BY dbo.fn_perrow_rand2()

Notice that you get a different row with almost every invocation.

   Prev. page   [1] 2     next page
 
 

ADS BY GOOGLE