DOWNLOAD THE CODE:
Download the Code 40621.zip

In Listing 4, I used a derived table called OOD to perform the inner join between Orders and OrderDetails as a separate step. However, the query became longer than the query in Listing 3. In more complex scenarios, you might end up nesting derived tables and producing code that's not easily readable, thus missing the whole point of creating a query that's easy to maintain. So with conventional join techniques, you might need to choose between two undesirable solutions: one that doesn't match the way you think and another that uses many derived tables and is overly long. The following undocumented join techniques can help in such situations.

Using Parentheses
The first technique I discuss is the one I recommend using because it lets you write highly readable queries and gives you full control of the logical order of join processing. Note that SQL Server's query optimizer can access the tables in a different order than the logical order you specify in the query if doing so will produce a more efficient plan while still returning the correct results. Similar to the way you use parentheses to determine evaluation order of regular expressions, you can use parentheses to determine join-evaluation order.

Consider the query in Listing 5, which provides a solution to our task. Note that I didn't use a derived table here; I just enclosed the inner join between Orders and OrderDetails in parentheses, forcing SQL Server to process the join in the parentheses as an independent step. You can even nest parentheses if necessary. This example doesn't require nested parentheses, but for illustration purposes, Listing 6's query encloses in parentheses the result of the left outer join between Customers and the result of the inner join between Orders and OrderDetails.

Using JOIN-Condition Order
Because the second undocumented technique isn't very intuitive, I recommend that you don't use it. The idea behind this technique is to control the logical order of join processing by the order in which you specify JOIN conditions. The difference between this technique and the conventional techniques is that with this one, you don't necessarily specify a JOIN condition right after the table names it refers to. You might specify several pairs of tables one after the other and several JOIN conditions one after the other. With this technique, the query in Listing 7 provides a solution to our task. By specifying the JOIN condition between Orders and OrderDetails first, as the code at callout A shows, and the one between Customers and Orders second, as the code at callout B shows, you determine the join-processing order to be Orders joined to OrderDetails first, then Customers joined to the result of joining Orders to OrderDetails.

The code in Listing 8 shows another variation of a query that uses this technique and provides a solution to our task. This code specifies all tables together and all JOIN conditions together. According to the JOIN-condition order, the code performs an inner join between Orders and OrderDetails first (call the result VT1). Next, it performs a left outer join between Products and VT1 (call the result VT2). Finally, the code performs a full outer join between Customers and VT2, returning the desired result.

Note that certain rules govern the relationship between the table order and the JOIN-condition order you specify. Because this technique is undocumented, I haven't found any source that explains the reason behind this logic, but it seems that the JOIN conditions must follow a chiastic relationship to the table order. That is, if you specify tables T1, T2, T3, and T4 in that order and the JOIN conditions match T1 with T2, T2 with T3, and T3 with T4, you must specify the JOIN conditions in the order opposite to the table order, like this:

T1 <join_type> T2
T2 <join_type> T3
T3 <join_type> T4
ON T4.key = T3.key
ON T3.key = T2.key
ON T2.key = T1.key

To look at this join technique in a different way, a given JOIN condition can refer only to the table names right above it or table names that earlier JOIN conditions already referred to and resolved. In the preceding inline pseudo code, the first JOIN condition can refer only to tables T3 and T4 because these are specified right above it. The second JOIN condition can refer to T3, T4, or T2 because the first JOIN condition already resolved T3 and T4 and because the join above T3 <join_type> T4 is between T2 and T3. I'll leave these rules with you to think about; maybe you can figure out the reasoning behind them.

Simpler Is Better
Discovering new T-SQL techniques is always exciting, especially when they make your life simpler. Again, I recommend using the first technique I discussed because it's much more intuitive than the second and usually lets you write short code that matches the way you think. If your thinking is complex, your code is bound to be complex, but if you think simple, now you have a tool to write simple code.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I was surprised that in your article that you didn't mention the documented technique about optimizer hints:

Select ... From tableA inner LOOP join tableB on...

The main advantage of this technique is that it is simple to implement (i use it indirectly to force use of indexes without using the hint with index()), and it also forces the order.

The main disavantage of this technique is that it force also the type of join. But if the nested loop join is suitable for the job, the result is quite understandable.

I used it many times to get the proper join in the proper order, and more recently today.

By the way SQL 7.0 tends to go away too quickly from nested loop joins in favor of other join operators that give less performance (in the context of the query).

I experienced that behavior many times in my pratice of query optimizations.

Maurice Pelchat

 
 

ADS BY GOOGLE