The data type concept in review
The data type concept is one of the concepts least understood by database practitioners. This misunderstanding is both a cause and a consequence of the failure of SQL and the commercial database management systems (DBMSs) based on it to implement relational domains, which are nothing but data types that can be as complex as you want. As a result, object DBMS proponents mistakenly claim that the relational approach doesn't support so-called complex types, and therefore object DBMSs are superior to relational DBMSs. In this article, I explain the data-type concept, the meaning of DBMS' support of data types, and the distinction between simple and complex types.
Data Type Basics
Databases are structured collections of facts about real-world entities. Consider, for example, the following facts about a company's employee: "Employee 160, named Pianka, works in department D111, was hired on 10/11/1972, earns a salary of $22,250." In natural language, this statement of facts is a logical proposition asserted to be true. (Table 1, page 54, defines some common database terms.)
The general form of propositions about employees is: "Employee with employee number (EMPNO) and name (ENAME), works in department (DEPTNO), was hired on date (HIREDATE), earns a salary of (SALARY)." This generalized form of proposition is called a multivalued predicate, and the terms in parentheses are value placeholders. Substituting specific values for the placeholders turns the predicate into specific propositions. In a relational database, each table has a predicate; the sets of values (representing propositions) are rows, and placeholders map to columns.
For propositions to be true, values substituted for placeholders must be valid according to business rules. Data types constrain database values to the valid ranges that the business rules define. Data types serve as named conceptual pools of permissible values for columns: To be valid, every database value must come from such a pool and, therefore, be of a type.
Two aspects of data types are poorly understood. One aspect is the important distinction between the type and its representation
how a DBMS encodes the data type's values. This misunderstanding stems in large part from the fact that SQL's DBMS types have only one user representation, which implicitly bears the same name as the data type (e.g., the DECIMAL representation of the DECIMAL data type). But data types can have multiple possible representations; for example, you can represent a temperature data type as degrees Fahrenheit, degrees Celsius, degrees Kelvin, or simply hot, warm, cool.
The second aspect involves the fact that, aside from representations, a data type also includes operators for every representation exposed to users. With the operators defined for a given type, users can manipulate the values of that type. Without operators, they can't. For example, exposing the Fahrenheit representation to users means providing operators for manipulating degrees Fahrenheit. At least two basic operators must be defined for every data type (although more operators are required for usefulness):
- read: retrieves (specifies) a specific database value of that type
- =: compares two values of that type
Thus, data types constrain values and operators.
Simple Data Types
Numbers and character strings are the most basic kinds of values, simple values. A numeric value has multiple digits and a character string consists of multiple letters, but these internal structures are usually of no interest to users. The operators for these data types insulate users from the internal structure. Users can apply operators, such as comparison (=, >, <), Boolean (NOT, AND, OR), arithmetic (+, -, *, /), or concatenate (|), to values represented as decimals or strings without having to know anything about the internal structure of values so represented. Furthermore, simple data types have generally agreed-on, well-defined representations and operators.
All DBMSs come with one or more system-defined (built-in) simple data types. Aside from the two basic operators, DBMSs support additional operators for simple data types. SQL Server's data types, for example, include DECIMAL, NUMERIC, FLOAT, REAL, INT, SMALLINT, TINYINT, CHAR, VARCHAR for which SQL Server provides a variety of func-tions (operators). The seven numeric types have the same functions but different value ranges, so they are distinct (otherwise, they would all be possible representations of one data type; for example, numbers). Likewise, the two character types are distinct.
User-Defined Data Types
Suppose that employee ID numbers consist of one character and two digits. An EMPNO data type would have a CHAR(3) representation and a permissible value range of A00-Z99. Although some functions applicable to character strings (=, <>) are also meaningful for EMPNO values, others (+, REVERSE) make no sense.
Thus, EMPNO has the same representation as, and some of the operators of, the system-defined CHAR data type, but because its valid value range and operator set are narrower, it is distinct from CHAR. Thus, all possible employee numbers (EMPNO) doesn't mean the same as all possible character strings of length three (CHAR(3)).
Unless users can define such a data type, the valid range of employee numbers in the database and the operators applicable to them will be those of the system-defined CHAR data type. Furthermore, all data types represented to users as strings of length three (e.g., DEPTNO) will be the same as the system-defined data type CHAR(3), even though they have different value ranges, operator sets, and meanings. This situation would impair the DBMS' ability to protect the integrity of both database content and manipulation in accordance with business rules. Because a large number of arbitrary rules is possible, and they all can't be system-defined, a DBMS needs to support user-defined data types.
Prev. page  
[1]
2
next page