• subscribe
July 01, 1999 12:00 AM

T-SQL for Starters: Simple and Correlated Subqueries

SQL Server Pro
InstantDoc ID #5450
Downloads
5450.zip

Rather than incur the overhead of this correlated subquery, you can write it as a join. The SQL statement for this join operation is

SELECT distinct authors.au_id, au_lname, au_fname 
FROM authors INNER JOIN titleauthor 
ON authors.au_id = titleauthor.au_id

The INNER JOIN, as I discussed last month, gives you the rows in which a match occurs between the two tables. So the results will show only the 19 published authors. What about the four authors who don't have published books? You can easily get a list of those authors by adding a NOT condition to the correlated subquery:

SELECT au_id, au_lname, au_fname 
FROM authors
WHERE NOT EXISTS (SELECT * FROM titleauthor      
	WHERE titleauthor.au_id = authors.au_id)

But any time you see a NOT in a query, think twice before accepting it as the best approach. NOT conditions are usually inefficient. You get the required results, though.

Writing this query as a join is a little trickier:

SELECT DISTINCT authors.au_id, au_lname, au_fname, titleauthor.au_id
FROM authors LEFT OUTER JOIN titleauthor
ON authors.au_id = titleauthor.au_id  
WHERE titleauthor.au_id IS NULL

This query retrieves a list of all the authors and then looks for the rows where the inner table—in this case Titleauthor—shows a NULL value. Be careful with this query; you must run it using the ANSI (SQL-92) JOIN syntax rather than including the JOIN in the WHERE clause.

Also, take care that what you are writing is a correlated subquery and not a simple subquery. For example, this query returns all 23 authors, which you know is not correct:

SELECT au_id, au_fname, au_lname
FROM authors
WHERE EXISTS
	(SELECT DISTINCT au_id 
	FROM titleauthor)

This incorrect result occurs because the programmer did not pass the author ID into the subquery. This is a simple subquery, which can run alone without reference to the outer query. The simple subquery runs once, and as long as it finds even one row in the Titleauthor table, it evaluates as true. So for each author, the query simply asks for the three columns. It runs quickly, because it does not evaluate the subquery multiple times. If a correlated subquery runs a lot faster than you expected, check to see whether the subquery is correct.

Is It All Bad News?
So, are correlated subqueries always inefficient? No, their practicality depends on how you use them. For example, suppose the Pubs database is very large, with thousands of authors. If you want to find out who has written books, the correlated subquery is inefficient because it would run thousands of times. An INNER JOIN might work better based on the comparative results of the estimated execution plans.

But what if you want to search for just one author and see whether that author has written a book? Screen 5 shows the SQL statement for that query. In this case, you have a WHERE clause in the outer query, and you have to evaluate only one row in the correlated subquery. That's acceptable. Nothing is wrong with a correlated subquery when you are performing a specific search. Problems arise only when you are looking at many or all of the rows. So don't think that you always have to avoid correlated subqueries.

Subqueries in UPDATE Statements
You can use subqueries in SELECT statements for searching on specific records and for UPDATE statements. You might need to update data based on the results of a query, but typically you need to update only one row at a time. Often, you can control the update (insert or delete) based on data values from another table. For example, suppose you want to drop from the database the authors who have not delivered on their book contracts. As Listing 1 shows, first identify the authors, and then delete the authors who have not written a book.

This DELETE statement changes the Authors table based on data from the Titleauthor table. DELETE works because you have specifically requested authors who do not have an entry in the Titleauthor table.

If you tried to delete an author who had written a book, the referential integrity (RI) constraints would prevent you from deleting the author and leaving orphan entries in the Titleauthor table.

Remember This
If you keep in mind the basics of when to use (and when not to use) subqueries, you will find them a useful tool in your SQL Server programming toolkit. You can often write subqueries as joins, which typically results in better performance. Simple subqueries are relatively fast. Correlated subqueries evaluate once for each row of the outer query, so use these queries only for single-row results.



ARTICLE TOOLS

Comments
  • John
    5 years ago
    Aug 31, 2007

    Excellent Primer

  • Adam Fleck
    8 years ago
    Jun 29, 2004

    Very informative article. I use what turn out to be non-correlated subqueries quite a lot, and was glad to find they're non-correlated. The certainly run quite fast, though my recorsets are fairly small - only about 20,000 outer and about 2,000 inner.

    Concerning the correlated subquery example in your article: although this illustrates your point very nicely, why not use a non-correlated subquery instead, along the lines of:

    SELECT a.au_id, a.au_lname, a.au_fname
    FROM authors a
    WHERE a.au_id IN (SELECT t.au_id
    FROM titleauthor t)

You must log on before posting a comment.

Are you a new visitor? Register Here