• subscribe
November 01, 2001 12:00 AM

The Outer (Join) Limits

SQL Server Pro
InstantDoc ID #22689

When working with relational databases, you sometimes need to bring your related tables together in a single resultset in an operation called a join. In "Joins 101," October 2001, I demonstrated how to create an inner join, which displays only the rows the two tables have in common (based on the JOIN condition). For example, an inner join between the Customer table and the Order table shows only the customers who've placed an order and only the orders placed by valid customers. Often, your interest goes beyond those customers who purchased something. For example, you might want to see all customers—with their orders, if applicable—regardless of whether they've purchased anything. For this, you need an outer join.

The Venn diagram in Figure 1, page 10, shows the relationship between customers and orders. The intersection (the inner join) returns only the rows where the circles overlap—in this case, CustomerIDs 1, 2, 5, and 6. Customers 3 and 4 have made no purchases. If you're looking for all customers, regardless of whether they've purchased anything, you need more than just the intersection. In effect, you're looking for the entire Customer circle, showing the orders for those who've placed them (the intersection) and a NULL value for those who haven't (the outer part of the Customer circle, which doesn't overlap).

To begin writing an outer join, I recommend that you write the FROM clause first and create a Venn diagram to illustrate what you're doing. The order in which you draw the circles in the Venn diagram will help you write your FROM clause. In Figure 1, the Customer table is on the left and the Order table on the right. Start writing your FROM clause by using an alias to list the two tables (see the sidebar "Aliases in T-SQL" for information about how to create and use aliases) and the keyword OUTER for the join type:

FROM dbo.Customer AS C
  OUTER JOIN dbo.[Order] AS O

At this point, you haven't specified the outer table or the direction of the outer join. The direction defines which table all rows should be returned from—the outer table—regardless of whether a matching row exists in the joined table. In this case, you want all customers. The Customer table is on the left in the diagram and, more important, left of the JOIN keyword in the FROM clause; therefore, you're defining a left outer join. Add the join type LEFT just before the OUTER JOIN definition, and the FROM clause will be almost complete. You just need to add join criteria and your SELECT list. The complete syntax of this outer join is

SELECT C.FirstName
    AS CustomerFirstName,
    O.ProductID
FROM dbo.Customer AS C
  LEFT OUTER JOIN dbo.[Order] AS O
      ON C.CustomerID = O.CustomerID

This statement returns the results that Table 1 shows. The rows I've highlighted in bold are included only because this is an outer join. Rows not bolded are present in both an inner and an outer join.

Could you rewrite this query as a right outer join, and if so, would it be any different? You can indeed change this query to a right outer join by simply switching the order of the tables and changing the direction of the join to right. In a right outer join, the FROM clause would read as follows:

FROM dbo.[Order] AS O
  RIGHT OUTER JOIN dbo.Customer AS C
      ON C.CustomerID = O.CustomerID

You can use my TSQLTutorJoins sample database to test this join and verify that the results are the same regardless of their direction. (You can download the creation script for this database at http://www.tsqlsolutions.com; see "Obtaining the Code," page 3, for download instructions.) Make sure you switch the table order when testing.

There is no difference in how these queries perform or gather data; the only difference is in the order in which the tables appear in the FROM clause. There is no difference in processing or performance between the left outer join and its equivalent right outer join. When you're working with two tables, the join type you choose is solely a matter of preference. I prefer to write two-table outer joins as left outer joins because I tend to draw Venn diagrams with the more interesting table (relative to what I'm looking for in my queries) on the left. However, when you perform outer joins on more than two tables, you need to combine left and right outer joins to more accurately answer questions about your data, rather than choose based on preference. (I'll talk more about that in a future column.)

You've listed all customers regardless of purchase, but what if you want to see all orders, regardless of whether a valid customer placed the order? The TSQLTutorJoins database supports over-the-counter sales by allowing NULL values for CustomerID in the Order table. Therefore, to see all orders regardless of whether they're associated with a valid CustomerID, you need an outer join, with Order as the outer table. In the SELECT list, let's ask for the product ID and customer's first name:

SELECT C.FirstName
      AS CustomerFirstName,
    O.ProductID
FROM dbo.[Order] AS O
  LEFT OUTER JOIN dbo.Customer AS C
      ON C.CustomerID = O.CustomerID

This code example returns the results that Table 2, page 12, shows.

Another Way
As I mentioned in "Joins 101," you often need to be able to recognize different syntax types and know the pitfalls of each. You can rewrite many joins another way—either with a subquery or possibly by using different syntax. In SQL Server, you can write outer joins in alternative syntax that's not always identical to the ANSI standard. A T-SQL syntax for outer joins existed even before ANSI had defined a standard specification for outer joins. I can't cover all the potential problems that this older syntax creates, but let's look at the syntax and a few shortcomings.

For a simple two-table query with no search conditions, the old T-SQL syntax compares well to the ANSI style. Using the old T-SQL­style outer join to list all customers—with their orders, if applicable—regardless of whether they've purchased anything, you include only the tables and their aliases in the FROM clause:

FROM dbo.Customer AS C, dbo.[Order] AS O

and the join conditions in the WHERE clause:

WHERE C.CustomerID = O.CustomerID


ARTICLE TOOLS

Comments
  • Scott
    8 years ago
    Feb 25, 2004

    join article

  • anthony
    8 years ago
    Feb 25, 2004

    I thought this was a great article. I've been having trouble understanding how left/right outer joins differ from regular joins, and this is the first article that I have come across that clearly explains the difference verbally and visually. Thanks.

You must log on before posting a comment.

Are you a new visitor? Register Here