The search argument's location makes a difference
"You can't always get what you want" aren't just famous lyrics by the Rolling Stones. They're sometimes a mantra when you're writing n-table joins to return specific dataespecially when you add search arguments. Furthermore, if your queries still contain old-style join syntax (*=), you might think you're receiving the correct results, when in fact your queries are ambiguous and return inaccurate data. For these reasons, let's look at writing flexible n-table joins with search conditions, using only the ANSI SQL-92 standard.
This T-SQL Tutor column addresses one of the most common questions about joins in general that students ask me: What's the difference between search conditions in the WHERE clause and search conditions in the JOIN clause? Interestingly, the search conditions always mean the same, but the data they return might vary according to their placement within a query. In "n-Table Joins," February 2002, InstantDoc ID 23733, and "n-Table Outer Joins," April 2002, InstantDoc ID 24319, I discussed the benefits of adding the related data of a third, fourth, or nth table to a query so that you can retrieve more detailed information. I also pointed out that when you add additional tables to an outer join, you need to be aware of the ways those extra tables can affect the outer joins. Now, let's look at the effects of adding search arguments to these sometimes complex n-table joins. To test the join syntax that this column covers, you can download the NTableSearchConditionsExamples.sql commented script file at http://www.tsqlsolutions.com, InstantDoc ID 24854.
FROM and WHERE
When you write a join, you can add search conditions in either of two locations to restrict and limit the rows returned. Typically, when you want only rows that meet a certain condition, you add a WHERE clause, which lets you restrict the entire result set that the query will return. However, when the query also includes a join, you can choose to apply the search condition in the FROM clause as one of the elements that define the join condition, or you can apply it to the result set in the WHERE clause. If you place the search argument in the FROM clause, SQL Server includes the search argument's criteria in the definition of the intersection between the joined tables. If you place the search argument in the WHERE clause, the criteria applies to the entire set of joined data (almost as if the search condition were filtered after the join was processedregardless of join type). When all joins in one query are inner joins, the search argument's location doesn't affect the query's result set, processing, or performance. In both the FROM and WHERE clauses, the search condition has the same effect because SQL Server returns only data that meets the join condition. However, for outer joins, the search argument's location can significantly affect results because an outer join includes rows that don't meet the join condition. (Remember, an outer join is an inner join with the addition of rows from the outer table that don't meet the join condition.) When you add search conditions to your outer-join queries, their placement dictates whether the conditions are applied as part of the inner join or to the entire result set (i.e. after the outer join). If you don't properly place your search conditions to describe the set of data you're looking for, you could end up with undesired results.
Location, Location, Location
Let's look at an example that demonstrates the benefits of strategically placing search conditions in the FROM and WHERE clauses in n-table inner joins and n-table outer joins. Suppose your company assigns customers to its salespeople alphabetically by last name in one of two groups: A-N or O-Z. And let's say that you, a salesperson, want to know which of your O-Z customers have purchased which products. You can start by using an inner join to return only your O-Z customers who have made a purchase. To fulfill the request, Show me all customers and their orders, then remove the customers who don't belong in my O-Z set, you could add the search condition in the WHERE clause (i.e., after the join), as the code at callout A in Listing 1 shows. Or to find only your customers and their orders, you could add the search condition in the FROM clause (i.e., before the join), as the code at callout A in Listing 2 shows. The result set and internal processing of the inner joins for the queries in Listings 1 and 2 are the same and return the four rows that Table 1 shows.
If you want to request the names of all your customers, whether or not they've made a purchase (in the O-Z range, Peter is the only customer who hasn't bought anything), you'd have to write an outer join. In this case, the location of the search argument could make a significant difference. The query in Listing 3, page 10, shows an outer join with the search condition in the WHERE clause. Table 2, page 10, shows this query's result set, which changes only slightly from the result set in Table 1 (it now includes customer Peter).
But what happens when you place the search condition in the query's FROM clause instead, as the code in Listing 4 shows? This query returns a larger result set, which Table 3 shows. Of the nine returned rows, the italicized rows are the result set of the inner join (note that Peter Parker isn't italicized), and the bolded rows are the result of the outer join. You might think that the query hasn't applied the search conditionbut it has. If you review the information that's supplied for the ProductID, Product Name, and Category Name columns, you'll see a difference between the Product and Category data for customers who are in the O-Z range as opposed to those who aren't. Arthur Curry has made a purchase (as ProductID 6 shows), yet Product Name and Category Name are NULL. They're NULL because they didn't meet the inner- join criteria; they were added because you requested all rows in the intersection (the inner join that is part of an OUTER JOIN query) plus all rows that don't meet the join criteria (the rows that are added from the outer join). When you add this additional search condition to the FROM clause, you're actually making a slightly different requestShow me what all my customers (names from O to Z) purchased (defined by the intersection between customers and products), yet include all customers (even those outside the O-Z range) in the list. Therefore, the query completes the outer join after applying the search condition during the inner join.
Let's compare the data sets that Listings 3 and 4 return. Both queries include the intersection of customers whose last name falls within the O-Z range and their purchased products. For Listing 3, the search argument is in the WHERE clause and therefore applies to the entire set of data; only O-Z rows are returned. For Listing 4, the search argument further defines the join condition, meaning that you're asking to see only the purchases (i.e., the intersection) for customers whose names fall within the O-Z range. But you've also asked for an outer join, which tells SQL Server to include all customersregardless of whether they've made a purchase or whether their last names fall in the O-Z range. Even if customers whose last names fall outside the O-Z range have made a purchase, you're not interested in seeing that data.