Test and test again for optimization
You know by now that SQL Server can perform various types of joins (i.e., hash, merge, nested loop) and that different indexes can be useful to each join type in different ways. As I mentioned in "Hashing for Performance," April 2002, InstantDoc ID 24024, I've found no easy algorithm that's guaranteed to find the best indexes for any given query or table. However, I've devised a series of tests you can run to determine the best indexes for a join query. In this article's examples, you can run a test query against the Northwind sample database and experiment with indexes and performance. (Thanks to my colleague Kimberly Tripp-Simonnet for performing the initial testing of these indexes on much larger tables than those in Northwind.)
Most of the examples I've used in past columns have joined tables that have just one join column in common. If your join columns are composite, any index you explicitly build to support the join should include all the columns. But what if you have one join column and one or more search arguments? Is it better to index the join column or a column in a highly selective search argument? What happens if you have a composite index on both the join column and a search argument? In SQL Server 7.0 and earlier, I've never observed the optimizer choosing an index that supported both a join and a search argument for the same query. However, the SQL Server 2000 optimizer is much smarter, so I recommend that you add such indexes to your list of possibilities to consider and test.
Try It Out
Let's run a series of tests, using different indexes on two tables involved in a JOIN operation. The test query also has a search argument for each of the tables. The query runs against the data in the Northwind database's Orders and Order Details tables, but to avoid conflicting with the existing indexes on these tables, use the code that Listing 1 shows to make copies of the tables first.
How can you tell which indexes result in better performance? If the tables have millions of rows, you can use your clock or wristwatch. In Kimberly's original tests, the join with no indexes took more than 30 seconds, so she could easily recognize an improvement. But queries against the Northwind tables all seem to execute in just a second or two, so how can you really tell what works best?
In SQL Server 6.5 and earlier, one main way to compare the efficiency of one query against another is to look at the values that STATISTICS I/O returnsin particular, the value for logical page reads. However, as of SQL Server 7.0, that method is no longer sufficient. As I've mentioned, for hash joins, SQL Server might have to pass through each table only once, performing a minimal number of reads. However, the memory space required, plus any other internal processing, can make hash joins much more expensive than other types.
So for SQL Server 2000 and 7.0, to compare performance as I add different indexes, I usually look at the CPU time value in addition to or instead of looking at STATISTICS I/O. Make sure to check the CPU time reported after the query has executed. The CPU time that SQL Server reports before executing the query is the time required for parsing, optimization, and compilation. I also don't pay much attention to the elapsed-time value because factors other than the query's efficiency can affect that value. For example, other users on the server computer affect elapsed time, even if those users are running nonSQL Server applications.
Listing 2 shows the query that I want to tune and the statements that turn on the statistics measurements. In the STATISTICS I/O output, I'm interested in only the value for logical reads, which indicates how many pages need to be accessed. I'll leave it to you to use the SET STATISTICS PROFILE ON command to examine the query plans.
When I ran the test query with no indexes on the tables, I got the following output:
Table 'od2'. Scan count 1, logical
reads 10, physical reads 0, read-ahead reads 0.
Table 'o2'. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 16 ms.
Prev. page  
[1]
2
next page