That 4000 in the calculation still makes the result bigger than I'd like. In this case, all 4000 rows in the outer table are part of the result, causing 4000 accesses of the inner table. Another way to reduce the number of page accesses is to reduce the size of the outer input. Besides checking for a clustered index on a join column, the optimizer tries to join the tables with the smaller input first. In Query 1, the Employees table has a clustered index on the join column, EmployeeID, but it's also dramatically smaller than the Orders table. Employees has only 9 rows; Orders has 830. For Query 1, if the optimizer chooses a nested-loop join, it would use the smaller Employees table as the outer input so that it has to access Orders only nine times.
If you have a WHERE condition involving the outer table, the number of qualifying rows decreases, and SQL Server needs to access the inner table fewer times. If you change Query 1 to include a WHERE clause on the Orders table, as Query 3 shows, the plan changes.
-- Query 3:
SELECT LastName, FirstName, OrderID, OrderDate
FROM Employees e JOIN Orders o
ON e.EmployeeID = o.EmployeeID
WHERE OrderDate < '1996-12-01'
Now, only 121 rows in Orders are part of the result. That smaller size, combined with the clustered index on the join column in Employees, means that the optimizer now chooses Employees as the inner table. SQL Server will use a nested-loop join because the clustered index lets SQL Server quickly find the matching rows in the inner table.
Figure 1 shows the query plan for Query 3. The first line of the plan tells the join type (Nested Loops) and specifies that the outer table will reference the EmployeeID column. The clustered-index scan on the outer table is the same as a table scan because no existing index can speed up access to that outer table. The WHERE condition on the OrderDate column reduces the number of rows to be returned and the number of times SQL Server must access the inner table. But SQL Server must access all the rows in the outer table to determine which ones have an acceptable OrderDate value. Finally, the query plan shows that SQL Server uses a clustered-index seek for the inner table because the index is on the column SQL Server uses to find the matching rows.
As I've noted in my previous columns, an index on the OrderDate column would be a good thing but wouldn't improve the query's performance nearly as much as a clustered index on the join column. A useful index on a search argument in the outer table means SQL Server doesn't have to access all the pages in the outer table, so the P1 value decreases. Usually, though, the P1 value is dwarfed by the value in the second term, R1 * P2, so a reduction in the P1 value yields only a minor improvement. An index on the outer table doesn't reduce the number of times SQL Server has to access the inner table because SQL Server still has to access the inner table for each qualifying row in the outer table, no matter how efficiently it found those qualifying rows. You can generalize the optimizer's choice of nested-loop join this way: The optimizer most often chooses a nested-loop join if one join input is much smaller than the other and the larger input has a clustered index on the join column.
Merge
In a nested-loop join, an index on the join column for the outer table is of no use. However, when you're tuning queries and tables, you might not always know which table will be the inner and which will be the outer, so you might end up with clustered indexes on the join columns for both input tables. SQL Server can use a merge join when the join inputs are both sorted on the join column, as in the case when both tables have clustered indexes on the join column.
You can think of merge join as combining two sorted lists of values. Suppose you have two piles of contractor information: one pile containing the master contract that each contractor has signed and the second containing descriptions of each project that a contractor worked on. Both piles are sorted by contractor name. You need to merge the two piles of paperwork so that each contractor's master contract is with all the project descriptions for that contractor. So, you basically need to make just one pass through each stack.
Prev. page
1
[2]
3
next page