back to blog index

Title

I’m writing this blog entry as a result of a customer query regarding a change
in behavior related to invocations of the RAND and NEWID functions in
SQL Server 2005.

Suppose you need to write an expression in a query invoking the RAND or
NEWID function (say for randomization purposes) and you need the
function to be invoked only once. For example, suppose you need to make
a random choice out of three options (call them ‘option one’, ‘option two’
and ‘option three’), and you write the following code:

select
  case rnd
    when 1 then 'option one'
    when 2 then 'option two'
    when 3 then 'option three'
    else 'oops'
  end
from (select cast(rand()*3 as int)%3 + 1 as rnd) as d;
Remember that the RAND function returns a float value in the range 0
through 1, inclusive (0 and 1 are possible result values). Casting the
expression rand()*3 as an integer will truncate the fraction part of the value.

Even though the probability to get exactly 3 back after casting is very low, it
is still a possibility. This is why I used %3 (modulo 3)—to ensure the
expression will return an integer in the range 0 through 2. By adding 1, the
expression is guaranteed to return an integer in the range 1 through 3.

Internally, SQL Server rearranges a simple form of a CASE expression
such as the above to the searched form, namely, it expands the WHEN
clauses to incorporate the full predicates. As an example, the above CASE
expression is internally evaluated as follows:
select
  case 
    when rnd = 1 then 'option one'
    when rnd = 2 then 'option two'
    when rnd = 3 then 'option three'
    else 'oops'
  end
from (select cast(rand()*3 as int)%3 + 1 as rnd) as d;
SQL Server 2000 evaluates each reference to the alias rnd assigned in the
table expression d separately; therefore, it actually invokes the RAND
function three times. You realize that this means that in SQL Server 2000 it
is possible that none of the WHEN clauses of the CASE expression will
evaluate to TRUE, and you might end up getting ‘oops’ back. Try running this
code several times in SQL Server 2000 and you will be able to verify this.

SQL Server 2005 changes the behavior of outer references to aliases
assigned in table expressions, where the aliased expression invokes the
RAND or NEWID function. SQL Server 2005 will invoke the function only
once, therefore it is guaranteed that one of the WHEN clauses in the above
query will evaluate to TRUE, and you will never get ‘oops’ back.

You can test this with a similar example that invokes the NEWID function.
To return a random value in the range 1 through n, instead of using the
expression:
cast(rand()*n as int)%n + 1 
You can use the expression:
abs(checksum(newid()))%n + 1
CHECKSUM(NEWID()) returns a random integer. Applying ABS on top
ensures you get a nonnegative integer. Applying %n (modulo n) ensures that
the value is >= 0 and < n. By adding 1 you ensure that the value is >= 1 and
<= n. In short, this is just another way to get a random integer value in the
range 1 through n. So the above query can be rewritten as follows:
select 
  case rnd
    when 1 then 'option one'
    when 2 then 'option two'
    when 3 then 'option three'
    else 'oops'
  end
from (select abs(checksum(newid()))%3 + 1 as rnd) as d;
Try running this query several times in both SQL Server 2000 and in SQL
Server 2005. In SQL Server 2000 you will occasionally get ‘oops’ back,
while in SQL Server 2005 you will never get ‘oops’ back. This is due to the
same change in behavior I described earlier.

A simple way to test the difference in behavior between the versions is by
running the following code:
select rnd, rnd
from (select rand() as rnd) as d;
Run it in SQL Server 2000 and you will get two different invocations of
RAND, hence most probably two different values back. Run it in SQL
Server 2005, and you’re guaranteed to get the same value back twice since
RAND will be invoked only once.

This change is described in SQL Server 2005’s Books Online under the
section “Behavior Changes to Database Engine Features in SQL Server
2005,” but it’s very easy to overlook it.

Note that the change in behavior has nothing to do with multiple invocations
of RAND or NEWID in the same query, as opposed to being invoked once
in a table expression and then referenced multiple times in the outer query.
For example, the following code can return ‘oops’ in both SQL Server 2000
and in SQL Server 2005:
select 
  case abs(checksum(newid()))%3 + 1
    when 1 then 'option one'
    when 2 then 'option two'
    when 3 then 'option three'
    else 'oops'
  end;
In SQL Server 2005 you can now use a table expression as demonstrated
earlier as a workaround. In SQL Server 2000 (and also in 2005), you can
use a variable as a workaround:
declare @rnd as int;
set @rnd = abs(checksum(newid()))%3 + 1;

select 
  case @rnd
    when 1 then 'option one'
    when 2 then 'option two'
    when 3 then 'option three'
    else 'oops'
  end;
You can experience similar problems in less obvious scenarios; for example,
consider the predicate:
where abs(checksum(newid()))%3 + 1 between col1 and col2
Also here, the predicate is expanded internally and NEWID is invoked multiple times:
where col1 >= abs(checksum(newid()))%3 + 1 
  and col2 <= abs(checksum(newid()))%3 + 1
So you end up getting two independent invocations of the NEWID function.
If you want to rely on a single invocation, as demonstrated earlier in SQL
Server 2005 you can use a table expression, and in both versions you can
store the result of the expression in a variable and then refer to the variable.

As the last example of problematic use of RAND and NEWID, I recently
saw code written by a programmer that was supposed to populate a
temporary table with a set of unique random integers (say, 50 random
integers in the range 1 through 100). The code looked similar to the
following:
set nocount on;
create table #random_values(rnd int not null);
create index idx1 on #random_values(rnd);

declare @i as int;
set @i = 1;
while @i <= 50
begin
  insert into #random_values
    select abs(checksum(newid()))%100 + 1
    where abs(checksum(newid()))%100 + 1 
      not in (select rnd from #random_values);
  if @@rowcount = 1 set @i = @i + 1;
end

select rnd from #random_values order by rnd;

drop table #random_values;
You realize that the two invocations of the NEWID function (in the
SELECT and WHERE clauses) are independent of each other. Therefore,
this code may very well populate the temporary table with duplicate values
in all versions of SQL Server. For example, here’s a subset of the output I
got after running this code:
rnd
-----------
2
3
4
4
...
I’m not saying that this particular solution is the optimal way to get a set of
unique random values rather just explaining the logical problems with this
solution and the workarounds.

In SQL Server 2005 you can now use a table expression as a workaround,
revising the SELECT query to the following:
...
  insert into #random_values
    select rnd
    from (select abs(checksum(newid()))%100 + 1 as rnd) as d
    where rnd not in (select rnd from #random_values);
...
In both SQL Server 2000 and 2005 you can use a variable as a workaround:
...
  set @rnd = abs(checksum(newid()))%100 + 1;
  insert into #random_values
    select @rnd
    where @rnd not in (select rnd from #random_values);
...
To summarize, care should be taken when using functions such as RAND
and NEWID in queries. If you need to rely on a single invocation of the
function, SQL Server 2005 will give you the desired behavior as long as you
encapsulate the invocation of the function in a table expression and then in
the outer query refer to the alias of the expression as many times as you like.
In SQL Server 2000 (or any other version), you can use a variable as a
workaround.

Cheers
--
BG
 

End of Article



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Very meticulous!

Synenergy

Article Rating 5 out of 5

According to the following article http://www.tek-tips.com/faqs.cfm?fid=6047 Rand() cannot generate 1.0. Which one is correct?

[Reply by Itzik:

Hi,

The article you point to says, quote:

"According to Books Online, RAND() returns "random float value from 0 through 1". Apparently this means [0, 1) - 0.000 can be generated (albeit very rarely) but 1.000 can not."

I’m not sure whether “apparently” means that the author assumes or knows that 1 cannot be created. The interpretation of what Books Online says in English is that 1 can be created.

I asked Microsoft about this, and got the following reply from the code owner:

"I just looked at the code. It can't return exact 0. The minimum value RAND returns is 4.656613e-10. But in reality, if people tries to use RAND to generate an int with a certain range, they would multiply it by an int range and then cast to int, which could certainly result in 0.

The upper bound is more tricky. I couldn't determine that it would never return exact 1. In the end, it involves floating point arithmetic. And since floating point arithmetic is imprecise, I can't conclude that it would never return 1."

Cheers, BG]

gmg@gbiz.com.au

Article Rating 3 out of 5

Do I understand Microsoft's response correctly? They are saying that Rand() cannot generate an exact 0, but whether it can generate 1 is inconclusive?

gmgsoftware

Article Rating 4 out of 5

Hi gmgsoftware!

My name is Megan Bearly, and I'm associate editor for SQL Server Magazine. The following is Itzik Ben-Gan's response to your question:

“Regarding 0, that’s also how I interpret Microsoft’s response, namely, that exact 0 cannot be produced. Regarding 1, the way I interpret the response is that from a brief examination of the code it’s inconclusive, therefore the safe approach is to assume that 1 can be produced. With enough time spent examining the code, which likely involves complex floating point manipulation, it may be possible to determine this conclusively.

Regards, Itzik"

Please let me know if you have any other questions. Thanks!

Megan Bearly Associate Editor, SQL Server Magazine mbearly@sqlmag.com

meganbearly

Article Rating 5 out of 5

 

      1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31       
or

More blogs about technology,
databases, and SQL Server.