You write a query that's logically correct, meaning it has no syntactical errors, all
referenced objects exist, and there's no apparent reason for a logical error. However, when you run the query against SQL Server, the query fails with runtime errors. You
might be looking at a situation in which the physical execution plan that
SQL Server chooses differs from the logical interpretation of the
query per ANSI-standard SQL. Certain differences between
logical and physical query processing might lead to query
failures. Let's walk through a couple of examples that
demonstrate those differences and explore the reasoning
behind them.
Logical vs. Physical Processing
According to ANSI SQL, query elements should logically be processed in the order that Figure 1, shows to yield a correct result set (the numerals indicate
the ordering). SQL Server (as well as other leading
database platforms) can and often does take a different
path in the physical processing of a query. SQL Server's
optimizer creates multiple physical execution plans
and chooses the plan that the optimizer estimates
will run fastest. The process of optimization is
driven by cost estimations.
In the plan that SQL Server ultimately
chooses to run, physical processing isn't
necessarily in accord with logical query-processing phases. For example, if a query
has a selective filter in the WHERE clause
and there's an index on the filtered column,
chances are good that the plan would first
process the WHERE clause (index seek),
then the FROM clause (joins and other
table operators). Ultimately, the idea is
that the query would still return the correct result per logical query processing. Or
would it?
Illogical Query Processing?
Apparently, in some cases a query that would
otherwise return correct results according to
logical query processing might actually fail for
the sake of improved performance. You might
or might not consider this a bug (I do). But
the fact is that such behavior occurs in SQL
Server and other major database platforms.
Here's an example that I tested in SQL
Server 2005 Developer Edition, Service
Pack 1 (SP1). Run the code in Listing 1 to
create the tables T1 and T2 and populate
them with sample data. Now consider the
following query:
SELECT T1.keycol, T1.val
FROM dbo.T1
JOIN dbo.T2
ON T1.keycol = T2.keycol;
Logically, the query first joins T1 and T2
based on the join condition: T1.keycol = T2.keycol. The query then selects the
columns T1.keycol and T1.val from the
result set returned by the join operation
and returns the result set that Table 1 shows.
Note that logically SQL Server is supposed
to process the SELECT clause after the
FROM clause.
Next, run the following query, which
attempts to cast T1.val to an integer. (Some
code in this article wraps to multiple lines
because of space constraints.)
SELECT T1.keycol, CAST
(T1.val AS INT) AS intval
FROM dbo.T1
JOIN dbo.T2
ON T1.keycol = T2.keycol;
When I ran this code in SQL Server 2005,
I got the following error: Msg 245, Level 16,
State 1, Line 1, Conversion failed when converting the varchar value 'abc' to data type int.
Why did the query fail? Because SQL
Server decided to scan all rows in T1, apply
the computation (convert the VARCHAR
column val to INT), then join to T2. Of
course, the design of the tables is moot, but
I'm using this scenario just for demonstration purposes.
In SQL Server 2000 the previous query
runs successfully, but there are other cases
when a query in SQL Server 2000 might
fail when logically you expect it to run
without failure. Try to focus on the general
idea and its implications and not on this
specific example.
Avoiding Query Failures
So, can you do anything to prevent query
failures because of the optimizer's behavior
in such cases? Yes: One option is to redesign
the tables when the situation calls for it and
when doing so is feasible. For example, you
can separate different types of data (e.g.,
character strings and numbers) into different tables instead of mixing them in the
same table. The other option is to use CASE
expressions. The items of a CASE expression
are guaranteed to be processed in order of
appearance, and the CASE expression short-circuits (i.e., terminates processing when
the answer is known) as soon as one of the
items yields true. Listing 2 shows a simplistic example of using a CASE expression to resolve the problem in our query. The
query now runs successfully and returns the
expected result set.
Prev. page  
[1]
2
next page