The asterisk (*) in BINARY_ CHECKSUM(*) tells the function to use all the columns in the row in its calculations. Alternatively, you can specify a subset of the columns in place of the asterisk. Because this function is CPU-intensive, specifying the minimum number of columns or minimum number of bytes will give you the best performance. The best candidates would be the columns in a unique index. If you decide to use specific columns instead of all the columns, you can add NEWID as a column in the BINARY_CHECKSUM function so that the BINARY_CHECKSUM query will return a random number each time. Thus, you don't need to use RAND in the query, which simplifies it slightly:

SELECT * FROM Table1
  WHERE (ABS(CAST(
  (BINARY_CHECKSUM
  (keycol1, NEWID())) as int))
  % 100) < 10

Because there's no sorting involved in the BINARY_CHECKSUM query, only a single pass through the table is required to choose n% of the rows. The time and I/O stay linear in proportion to the size of the table.

To test the BINARY_CHECKSUM query against the NEWID query, we set up three large tables containing 1 million rows (435MB), 7 million rows (3GB), and 14 million rows (5.4GB) on a HP ProLiant DL580 G2 server with 1GB memory, four 2.2MHz Intel processors, and eight 36GB disks in RAID 1+0 configuration. Table 1 shows the results. As Table 1 shows, the BINARY_CHECKSUM query saves a lot of time and I/O compared with the NEWID query.

The Microsoft SQL Server team realized that not being able to easily take random samples of rows was a common problem in SQL Server 2000, so the team addressed the problem in SQL Server 2005 by introducing the TABLESAMPLE clause.

This clause selects a subset of rows by choosing random data pages and returning all the rows on those pages. However, for those of us who still have products running on SQL Server 2000, who need backward compatibility, or who need truly row-level randomness, the BINARY_CHECKSUM query is a very effective option.
Marcelo De Barros and Kenton Gidewall

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE