When you're gathering business rules and requirements for a database project and implementing them in the data model, you might need to let a data value, or attribute, assume a value of unknown. Unknown is real. So you can expect to be able to store unknowns in your databases. Relational databases use a placeholder called NULL to represent unknowns. But NULL brings complications and overhead, so some data modelers and database administrators avoid using NULL. As I explain the use and limitations of NULL, keep in mind that one goal of a relational database is to express accurate data.
What Is NULL?
NULL is a condition. For example, suppose you need a date of birth to calculate retirement age. The DateOfBirth field must be a date data type, so you can't enter unknown, not applicable, zero, or a bogus value such as 1800-01-01. In such cases, you need a placeholder until you have the correct value. NULL fills this need. NULL means that some value is unknown, missing, or irrelevant. This three-valued logic is an important feature of the SQL language as defined by the American National Standards Institute (ANSI) and requires a NULL condition. In most programming languages, a Boolean expression (or predicate, in SQL), such as haircolor = 'brown' and gender = 'F', can be only true or false. In ANSI SQL, you have a third option, unknown (NULL), so the expression haircolor = 'brown' and gender = 'F' can equate to true, false, or unknown. If you have a query such as
SELECT * FROM employee WHERE haircolor = 'brown'
AND gender = 'F'
and you test against a table row in which the value of haircolor is brown and the value of gender is F, the test is true and that row is returned in the result set. If you test against a row that contains values of blonde and F for haircolor and gender, the test is false and that row isn't returned in the result set. Finally, if you test against a row that contains NULL for haircolor and F for gender, this test also fails, because you don't know what color the hair is. This third row is not part of the result set.
Because NULL is a placeholder and not a value (such as zero or a space), the ANSI specification states that the NULL condition must be the same for all data types (number, character, date). So you can expect equivalent behavior when you use any data type in a comparison expression that includes NULL.
The ANSI specification states that a column is nullable by default. Not all database management system (DBMS) vendors implement the ANSI specification, nor do they implement it consistently. For instance, SQL Server 4.2 behavior was NULL by default, the ANSI standard; but starting with SQL Server 6.0, the NULL property switched to NOT NULL by default. So to be sure, always specify whether a column can be NULL when you're writing CREATE TABLE statements. (WebSQL subscribers can download an example DDL at the link to this article at http://www.sqlmag.com.)
Why Use NULL?
A NULL condition is neither a space (for character or date data types) nor a zero value (for number data types). When you're developing and implementing a data model, you might find that you need to specify a NULL condition for a column. Figure 1 is a detailed entity-relationship diagram (ERD) of a simplified workplace environment with three tables: Project, Employee, and Paycheck. Project contains attributes describing the kinds of workplace projects an employee might be assigned to, Employee contains attributes describing individual employees, and Paycheck contains attributes describing employee compensation. To demonstrate a point, I've simplified the relationships and restricted the business rules. Each Project can have many Employees assigned to it, but an Employee is associated with only one Project at a time. Each Employee receives many Paychecks, but each Paycheck goes to only one employee.
I made some assumptions about the attributes in the Project table. The primary-key attribute (ProjNo), the project title (ProjName), and the starting date of the project (StartDate) all must have values when a row is inserted into the table; the end date of the project (EndDate) may have a value. ProjNo, ProjName, and StartDate all are created with a NOT NULL property; EndDate is created with the NULL property.
In the Employee table, the primary key (EmpNo), the employee's family name (EmpLname), and the employee's Social Security number (EmpSSN) must be filled in for each row in the table. The employee's given names (first plus middle, if known) can be NULL, and so can the associated project number.
Each time a paycheck is issued, an entry is made in the Paycheck table. The two-part (concatenated, or composite) primary key (WeekNbr plus EmpNo), the employee's Social Security number (EmpSSN), and the paycheck amount (PayAmt) must have values. The employee's job code (JobCode) may be NULL.
Figure 2 illustrates this design. I populated the tables with test data to demonstrate the concept of NULL. The Project table contains three projects, one of which has a firm end date; the others are open-ended. The decision to make EndDate nullable was based on the business environment. Not every project has a firm end date, and for some projects the end date is unknown or unimportant because a project might be continual and ongoing. Suppose the project manager wants to track project duration in days and months. If you make EndDate NOT NULL and use some sort of code to indicate no end date or an ongoing project, you can't make EndDate a datetime data type. And you can't use date functions such as DateDiff, which gives the project duration, as you can see in Listing 1.
Prev. page  
[1]
2
next page