• subscribe
April 20, 2006 12:00 AM

Designing for Performance: NULL or NOT NULL?

Don't avoid the handy NULL condition: Get smart about when to use it
SQL Server Pro
InstantDoc ID #49294

When you're gathering business rules and requirements for a database project and implementing them in the data model, you might need to let a data value, which eventually will become a column in the database, assume a value of unknown. Unknown is real; you work with unknowns every day.You should be able to store unknown in your databases. Relational databases, SQL Server included, use a placeholder called NULL to represent unknowns. NULL brings complications and overhead, so that some data modelers and database programmers avoid using it. But NULL has both supporters and detractors. The question when you're designing for performance is, is NULL a matter of choice? Or should you stay away from NULL altogether?

What Is NULL?
Before we answer those questions, though, let's define what NULL actually is and what it's good for. NULL is a condition. For example, suppose you need a date of birth to calculate retirement age.The DateOfBirth field must be a date data type, so you can't enter unknown or not applicable or n/a, and you shouldn't enter zero or a bogus value such as 1800-01-01. In such cases, you need a placeholder until you can enter the correct value. NULL fills this need.

NULL means that some value is unknown, missing, or irrelevant.This three-valued logic is an important feature of the SQL language. It's defined in the ANSI SQL standard and requires the presence of a NULL condition. In most programming languages, a Boolean expression (or predicate, in SQL), such as haircolor = brown and gender = F, can be only true or false. In ANSI SQL, you have a third option, unknown (NULL), so that the expression haircolor = brown and gender = F can equate to true, false, or unknown.

If you have an SQL query such as

SELECT * FROM employee 
  WHERE haircolor = 'brown' 
  AND gender = 'F' 

and you test against a row in the employee table in which the value of haircolor is brown and the value of gender is F, the test is true and that row is returned in the result set. If you test against a second row, which contains values of blonde and F for haircolor and gender, respectively, the test is false and that row isn't returned in the result set. If you test against a row that contains NULL for haircolor and F for gender, this test also fails because you don't know what color the hair is; hence the NULL haircolor.This third row also isn't part of the result set.

Because NULL is a placeholder and not a value (such as zero or a space), the ANSI specification states that the NULL condition must be the same for all data types (number, character, and date). So you can expect equivalent behavior when you use any data type in a comparison expression that includes NULL. SQL Server implements this ANSI requirement by including a NULL bitmap in each record, one bit for each column; a value of 1 indicates that the corresponding column is nullable. The NULL bitmap is present in every record of every user table.

The ANSI specification also states that, at the time of creation, a column is nullable by default. Not all database management system (DBMS) vendors implement the ANSI specification, nor do they implement it consistently. Internally, the SQL Server 2005 and SQL Server 2000 engine default behavior doesn't conform to the ANSI standard—that is, columns created without a declaration of NULL or NOT NULL are created as NOT NULL. I suspect that SQL Server was optimized for pure speed in the beginning, and now it has to maintain these settings for backward compatibility.

However, you'll never see this behavior if you execute CREATE TABLE code in Query Analyzer. Tools like Query Analyzer are ODBC-based, and they automatically turn the ANSI_NULL_DFLT_ON option to true, which means that columns created without a declaration of NULL or NOT NULL are created as nullable. It's easy to switch back and forth between ANSI compliance and noncompliance, either by setting the ANSI options in database properties or by including a session SET command—such as SET ANSI NULLS (ON|OFF)—with your code. Because this situation can be so confusing, your best bet is simply to declare NULL or NOT NULL for each column when you create a table.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here