• subscribe
August 23, 2010 12:05 PM

Predicate-Based Query Filters

Filtering rows isn’t as simple or straightforward as it seems
SQL Server Pro
InstantDoc ID #125592

Query filters perform a seemingly simple and straightforward task—filtering rows—but if you’ve been writing SQL code for some time, you know that there’s a lot of depth and many confusing aspects to query filters. Working with query filters is far from trivial. Over the next couple of months I’ll discuss query filters that T-SQL supports, query filters that standard SQL defines but that aren’t implemented in SQL Server (as of SQL Server 2008 R2), query filters that database platforms other than SQL Server support, and query filters that aren’t standard and that no platform currently supports but that are useful. This month I focus on predicate-based filters. Next month I’ll cover other types of filters.

In my examples I use a sample database called InsideTSQL2008. You can download the source code to create and populate the sample database from www.InsideTSQL.com/books/source_code/InsideTSQL2008.zip.

 

Predicates

T-SQL is based on standard SQL (both ISO and ANSI standards), which in turn is based on the relational model, which in turn is based on two mathematical branches: set theory and predicate logic. Understanding some of the key aspects of the mathematical foundations that T-SQL is ultimately based on can give you important insights into the language and can help you write better and more correct code.

A predicate is a statement or an expression that either holds or doesn’t hold. An example of a predicate is region = 'WA'. When evaluating a predicate for a particular entity instance (e.g., of a customer), the expression represents a proposition. For example, suppose that customer 43 is from Walla Walla, Washington, United States, and customer 42 is from Vancouver, British Columbia, Canada. The proposition that customer 42’s region is equal to Washington ('BC' = 'WA') is false. The proposition that customer 43’s region is equal to Washington ('WA' = 'WA') is true. In other words, you can think of a predicate as a general form of the more specific proposition, or as a parameterized proposition. The predicate can be true for some propositions but false for others.

The relational model relies on predicates for several purposes: as a modeling strategy, to define data integrity, and to filter rows in queries. Use of predicates as a modeling strategy involves listing examples for propositions that you need to represent in your database, taking out the data and keeping the headers (predicates), and defining the relations based on the predicates. An example of the use of predicates to enforce data integrity occurs in CHECK constraints. Examples of using predicates to filter rows in queries include the ON, WHERE, and HAVING query clauses that T-SQL supports; the standard FILTER clause for aggregates that T-SQL doesn’t support; and the nonstandard QUALIFY clause that Teradata supports. I cover all these clauses in this article.

 

NULLS and Three-Valued Predicate Logic

Part of the complexity and confusion of working with predicates is because SQL and its various implemented dialects don’t by default enforce a law in logic known as the law of excluded middle—instead, SQL uses three-valued predicate logic (or what most people know as simply three-valued logic). When the law of excluded middle is enforced, predicates can evaluate only to true or false. Some people believe that a valid relational model must enforce this law. However, SQL also supports cases in which predicates evaluate to unknown—this approach has to do with SQL’s support for the NULL mark, which represents the missing value concept from Codd’s relational model. Although Codd’s relational model defines two different kinds of missing values—missing and applicable, as well as missing and inapplicable—SQL implements only one mark representing all kinds of missing values.

Given a predicate that compares values (e.g., region = 'WA', when both sides aren't NULL), the predicate evaluates to true or false as I explained earlier. But when one or both sides are NULL, the predicate evaluates to unknown.

Determining how SQL treats true and false cases is pretty straightforward and intuitive, but treatment of unknown cases can be quite tricky. For example, it’s important to understand that all query filters (ON, WHERE, and HAVING, which T-SQL supports, as well as FILTER and QUALIFY, which T-SQL doesn’t support) accept true cases, meaning that they reject both false and unknown cases. The IF and WHILE statements, as well as the WHEN clause of CASE expressions, treat predicates similarly to query filters in the sense that they deal with true cases in one manner and false and unknown cases in another manner. The CHECK constraint, which enforces declarative integrity as part of a table definition, rejects false cases—meaning that it accepts both true and unknown cases.

So, for example, consider the predicate salary > 0. If this predicate appears in a query WHERE clause, a row with salary NULL is rejected. But if the same predicate appears in a CHECK constraint in a table, a row with salary NULL is accepted.



ARTICLE TOOLS

Comments
  • presse
    2 years ago
    Oct 04, 2010

    Thank you for : ON, WHERE Outer Join Confusion

  • BLOCK
    2 years ago
    Sep 26, 2010

    fantastic overview of how the definition and (mis-)use of nullable columns impact predicates and ultimately query design/performance.

You must log on before posting a comment.

Are you a new visitor? Register Here