DOWNLOAD THE CODE:
Download the Code 94378.zip

You write a query that's logically correct, meaning it has no syntactical errors, all referenced objects exist, and there's no apparent reason for a logical error. However, when you run the query against SQL Server, the query fails with runtime errors. You might be looking at a situation in which the physical execution plan that SQL Server chooses differs from the logical interpretation of the query per ANSI-standard SQL. Certain differences between logical and physical query processing might lead to query failures. Let's walk through a couple of examples that demonstrate those differences and explore the reasoning behind them.

Logical vs. Physical Processing
According to ANSI SQL, query elements should logically be processed in the order that Figure 1, shows to yield a correct result set (the numerals indicate the ordering). SQL Server (as well as other leading database platforms) can and often does take a different path in the physical processing of a query. SQL Server's optimizer creates multiple physical execution plans and chooses the plan that the optimizer estimates will run fastest. The process of optimization is driven by cost estimations.

In the plan that SQL Server ultimately chooses to run, physical processing isn't necessarily in accord with logical query-processing phases. For example, if a query has a selective filter in the WHERE clause and there's an index on the filtered column, chances are good that the plan would first process the WHERE clause (index seek), then the FROM clause (joins and other table operators). Ultimately, the idea is that the query would still return the correct result per logical query processing. Or would it?

Illogical Query Processing?
Apparently, in some cases a query that would otherwise return correct results according to logical query processing might actually fail for the sake of improved performance. You might or might not consider this a bug (I do). But the fact is that such behavior occurs in SQL Server and other major database platforms.

Here's an example that I tested in SQL Server 2005 Developer Edition, Service Pack 1 (SP1). Run the code in Listing 1 to create the tables T1 and T2 and populate them with sample data. Now consider the following query:

SELECT T1.keycol, T1.val
FROM dbo.T1
JOIN dbo.T2 


ON T1.keycol = T2.keycol;

Logically, the query first joins T1 and T2 based on the join condition: T1.keycol = T2.keycol. The query then selects the columns T1.keycol and T1.val from the result set returned by the join operation and returns the result set that Table 1 shows. Note that logically SQL Server is supposed to process the SELECT clause after the FROM clause.

Next, run the following query, which attempts to cast T1.val to an integer. (Some code in this article wraps to multiple lines because of space constraints.)

SELECT T1.keycol, CAST


(T1.val AS INT) AS intval FROM dbo.T1
JOIN dbo.T2
ON T1.keycol = T2.keycol;

When I ran this code in SQL Server 2005, I got the following error: Msg 245, Level 16, State 1, Line 1, Conversion failed when converting the varchar value 'abc' to data type int.

Why did the query fail? Because SQL Server decided to scan all rows in T1, apply the computation (convert the VARCHAR column val to INT), then join to T2. Of course, the design of the tables is moot, but I'm using this scenario just for demonstration purposes.

In SQL Server 2000 the previous query runs successfully, but there are other cases when a query in SQL Server 2000 might fail when logically you expect it to run without failure. Try to focus on the general idea and its implications and not on this specific example.

Avoiding Query Failures
So, can you do anything to prevent query failures because of the optimizer's behavior in such cases? Yes: One option is to redesign the tables when the situation calls for it and when doing so is feasible. For example, you can separate different types of data (e.g., character strings and numbers) into different tables instead of mixing them in the same table. The other option is to use CASE expressions. The items of a CASE expression are guaranteed to be processed in order of appearance, and the CASE expression short-circuits (i.e., terminates processing when the answer is known) as soon as one of the items yields true. Listing 2 shows a simplistic example of using a CASE expression to resolve the problem in our query. The query now runs successfully and returns the expected result set.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.