In case of an outer join, Step 3 says to add to the rows in V2 all rows from Customers (the preserved table) that had no match in Orders. In this case, customers C and D had no match, so SQL Server adds them to the rows from V2, with NULLs in the Orders table columns. The result is V3. Next, SQL Server applies the WHERE filter (custid IS NULL) to the rows in V3. Possible results from any logical expression are TRUE, FALSE, and UNKNOWN, but because you're using the IS NULL predicate, you can get only TRUE or FALSE. Figure 5 shows the rows in V3 and the result of the filter for each row under the match? column. Only the rows where match? is TRUE (the last two rows) move on to the next step.
The rest of the query processing is simply extracting the C.custid and C.city values from V4. You get the desired results: customers C and D, who made no orders.
Now we can discuss the difference between the ON clause and the WHERE clause and why I specified ON C.custid = O.custid and WHERE O.custid IS NULL. SQL Server applies the ON clause before adding the outer rows and applies the WHERE clause after. The outer join generates a NULL to identify a customer who placed no orders, so you must test for NULLs after applying the outer join and adding the outer rows—namely, in the WHERE clause.
I used the ANSI SQL-92 join syntax in Listing 2's query. SQL Server veterans might remember that before SQL Server adopted the ANSI SQL-92 join style, T-SQL supported (and still does for backward compatibility) an old-style syntax in which *= represents a left outer join. This old-style syntax doesn't have two separate clauses (ON and WHERE) for logical expressions; it has only the WHERE clause in which you specify both your join condition and your filter. An attempt to write Listing 2's query in the old-style syntax looks like this:
SELECT C.*
FROM Customers AS C, Orders AS O
WHERE C.custid *= O.custid
AND O.custid IS NULL
You get all customers in the result instead of just those who placed no orders. You can now understand why: SQL Server treated the whole logical expression as the join condition, generating an empty set from Step 2. Then, it added back the customers who had no match (all customers).
Seattle Customers Who Placed Fewer Than Three Orders
To test your understanding of the difference between the filters and of logical query-processing steps, write a query that returns the Seattle customers who placed fewer than three orders. Figure 6 shows the desired result. When you're done, compare your query to Listing 3's solution.
Here are the logical processing steps. Steps 1 and 2 join Customers to Orders based on a C.custid = O.custid match in the join condition. Step 3 adds customers who made no orders to the result. In Step 4, SQL Server applies the WHERE filter C.city = 'Seattle'; only the rows in which the customer's city is Seattle remain. Step 5 groups rows from Step 4 by C.custid, so each group is a unique customer. Step 6 applies the HAVING filter COUNT (O.orderid) to the groups from Step 5, so only groups with fewer than three orders remain. Finally, Step 7 processes the SELECT list (C.custid, C.city) and returns the customer ID and city.
This solution has two tricky parts. One is the placement of the C.city = 'Seattle' filter in the WHERE clause. If you mistakenly placed this filter in the ON clause, you probably noticed that the query returned customer D even though customer D is from Denver. This happens because although Step 2 removes the row for customer D, Step 3 adds it back again when it adds the outer rows that had no match.
The other tricky part is to specify COUNT(O.orderid) and not COUNT(*). If you specified COUNT(*), you got a count of 1 (customer C) instead of 0 for customers who made no orders. The reason for the unexpected result is that in the result of Step 3, SQL Server added an outer row for customer C with NULLs in the columns from Orders. COUNT(*) counts the rows regardless of whether they resulted from a match or a non-match in the join condition and regardless of whether the columns have known values or NULLs. COUNT(O.orderid) correctly counts only rows where orderid isn't NULL, which can result only from a match in the join condition.
You're in Control
I hope this explanation clears up some of the confusion about where to specify query filters. Remember: When you know the basics, the advanced problems become simple. Once you've mastered the rules of logical query processing, you'll be in full control of when SQL Server logically applies your filters.