DOWNLOAD THE CODE:
Download the Code 23731.zip

Consider Query 1 again: If both Employees and Orders had clustered indexes on the EmployeeID column, SQL Server could perform a merge join. The pseudocode for SQL Server's execution of the merge join would look something like this:

GET one Orders row and one Employees row
DO (until one input is empty);
 IF EmployeeID values are equal
  Return values from both rows
 GET next Orders row
 ELSE IF Orders.EmployeeID <> 
 Employees.EmployeeID
 GET next Employees row
 ELSE GET next Orders row

The query optimizer usually chooses the merge join strategy when both inputs are already sorted on the join column. If the inputs are both already sorted, little I/O is required to process a merge join if the join is one-to-many (1:M). A many-to-many (M:N) merge join uses a temporary table to store rows instead of discarding them as it usually does. If the data includes duplicate values from each input, the second input must return to the start of the duplicates in the temporary table as SQL Server processes each duplicate from the first input. However, in many cases, SQL Server won't use a merge join unless at least one of the join columns is unique.

Here's an example of joining two tables with and without a unique index. Listing 1 creates copies of the Orders and Order Details tables in the Northwind database and builds a clustered index on OrderID in both tables. When you initially join the tables and display the query plan, you'll see that SQL Server chooses a nested-loop join. Although the OrderID column is unique in the Orders table, if you don't specify unique in the index definition, the optimizer isn't aware that the key values are unique. So when you rebuild the clustered index in the Orders table, specifying that it must be unique, the revised query plan shows that SQL Server uses a merge join.

In some cases, SQL Server's optimizer might decide that it's cheap enough to sort one of the inputs before the join, then perform a merge join. If you change Listing 1 slightly so that the initial index built on the Order Details table is nonclustered, as Listing 2 shows, the query plan shows a sort operation before a merge join.

Know Your Tables
Usually, the query optimizer determines the type of join SQL Server will use. You can just write your join query and leave the choice of strategy to SQL Server. However, knowing how SQL Server performs the joins can help you decide which indexes will be most useful. The optimizer usually chooses nested-loop join if one of the join inputs is very small compared with the other input and the larger input has a clustered index on the join column. Merge join is the most likely strategy if both inputs are sorted on the join column and, in particular, if one of the inputs has a unique clustered index.

Both nested-loop and merge joins require appropriate indexes on the tables. However, if your application lets users build ad hoc queries, you might not know in advance what the best columns for indexes are. The third type of join, hash join, lets SQL Server achieve pretty good join performance even when your tables have no useful indexes. In my next column, I'll discuss hash joins.

End of Article

Prev. page     1 2 [3]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

GET one Orders row and one Employees row DO (until one input is empty); IF EmployeeID values are equal Return values from both rows GET next Orders row ELSE IF Orders.EmployeeID <> Employees.EmployeeID GET next Employees row ELSE GET next Orders row

This doesn't seem right.

If equal do something else if not equal do something else else do somthing different

It can only either be = or <>, there is no other condition.

I believe it should be: GET one Orders row and one Employees row DO (until one input is empty); IF EmployeeID values are equal Return values from both rows GET next Orders row ELSE IF Orders.EmployeeID > Employees.EmployeeID GET next Employees row ELSE GET next Orders row

Make sense?

David Wood

 
 

ADS BY GOOGLE