One way to reduce the query cost with large tables is to scan—and more important, sort—less data.To do so, you can use the new TABLESAMPLE option in SQL Server 2005. (For details about TABLESAMPLE, see "Query Data Samples in a Table," InstantDoc ID 49572.)TABLESAMPLE lets you request a sampling of a certain number or percentage of rows, which SQL Server translates to scanning a random sample of pages.As I describe in "Query Data Samples in a Table," SQL Server randomly chooses which pages to scan, and you're not guaranteed to get the exact number of rows you request. So, the higher the number of rows you request, the higher the probability that you'll get at least the number of rows you request.
With that in mind, here's how you can request one random row from Sales-OrderHeader, with a substantial reduction of the cost of the query:
SELECT TOP (1) SalesOrderID,
OrderDate, CustomerID,
ContactID
FROM Sales.SalesOrderHeader
TABLESAMPLE(1000 ROWS)
ORDER BY CHECKSUM(NEWID());
Examine Figure 2's execution plan, produced for this query, and notice that the query cost is substantially reduced—it's now about 0.048. STATISTICS IO reports only 23 logical reads. Much fewer rows were scanned, but more important, much fewer rows were sorted. Note that there's a small chance you won't get any rows back from the query, but in such a case, you can introduce-logic into your application to invoke the query again.
Returning n Random Rows per Group
A variation of returning n random rows is to return n random rows per group—for example, given @n as input, returning @n random orders for each employee. In SQL Server 2005, you can use the new APPLY operator to apply a table expression with the aforementioned logic per each employee, as Listing 1 shows. Table 2 shows one possible output of this query, but of course, every time you run it, you'll get a different output.
For details about the APPLY operator, see "SQL Server 2005's APPLY Operator," InstantDoc ID 47145. An attempt to use somewhat similar logic in SQL Server 2000 proves futile. For example, initially I wrote the query that Listing 2 shows, believing it to be logically equivalent to the APPLY query. However, the query returns a different number of rows every time I run it. Can you see the bug in the query? This query invokes the subquery once per each order (as opposed to once per each employee in the APPLY solution). Besides being much more expensive than the APPLY solution, this solution reevaluates the three random order IDs every time the subquery is run (i.e., once per order).
Therefore, you might get fewer than three matches for each employee, and you might get more than three.In short,this solution isn't correct.
Get to Know TOP, APPLY, and TABLESAMPLE
Randomization is a common need in T-SQL, but T-SQL introduces its own unique challenges. As we've seen, RAND returns a random float in the range 0 through 1, inclusive, and RAND is evaluated only once per query and not once per row. In the process of demonstrating techniques to tackle common randomization needs in T-SQL, I hope I've shown you how you can improve these solutions by using some of the new T-SQL features in SQL Server 2005.