• subscribe
October 01, 2001 12:00 AM

Joins 101

SQL Server Pro
InstantDoc ID #22071
Downloads
22071.zip

Welcome to the T-SQL Classroom

A relational database is composed of entities (aka tables), relationships between the entities, and attributes that describe the entities. By creating related tables, you minimize data redundancy, and therefore, the disk space utilized. For example, suppose you have two entities, Employees and Departments. If you separate employee information into one table and departmental information into another, you minimize disk utilization by including in the Employees table only an identifier of the employee's department rather than duplicating the departmental information for every employee. Although this separation minimizes disk space, it creates a problem. When you need to display employee information along with the appropriate department, you need a way to bring the two tables together. The type of query that brings the two tables together is called a join. A join is one important way of accessing relational databases; by using a join, you can unite entities into one resultset that fully describes the entities, relationships, and attributes that form your data's framework.

Although joins are simple, many people have difficulty determining when and how to use them. So, I decided that joins were a good topic to start this column. To cover all aspects of joins, I plan a series of articles to cover entity and attribute basics, definitions of the join types, and complex join concepts. I'll also examine how you can create indexes that improve join performance.

Back to Basics
A relational database consists of tables, with each table focusing on one entity. An entity is a classification given to a group of similar objects. Thus, an entity can be a person, place, or thing such as a customer or a product. Each entity has attributes that are typically specific only to that entity. An attribute—or, with respect to a table, a column—describes a non-decomposable characteristic of an entity. For example, in a products table, you might want to describe each product your company sells. Products generally have an ID (or other unique identifier), a name, a description, a price, and so on. Therefore, the Product entity table in your database would have simple attributes such as ProductID, ProductName, and Price.

When you consider what entities to store, think about how the data will be used and searched. If an attribute, or a component of an attribute, needs to be referenced in a WHERE clause or in an ORDER BY clause, you probably want to define the component as a single column. For example, a Name column is acceptable, but the data could be more easily accessed if you broke it into FirstName, MiddleInitial, and LastName columns. The three-column approach best serves the following search conditions and ORDER BY clause:

WHERE LastName = 'Smith' AND FirstName = 
   'Robert'
ORDER BY LastName, FirstName, MiddleInitial

However, you can take this approach too far. For example, you could break AddressLine1 into StreetNumber, StreetName, and so forth. You might need to break addresses down if you perform analysis based on demographics and need to search within an area's defined quadrants; however, AddressLine1 is usually sufficient.

When you describe an entity during table creation, you must also decide whether each attribute requires a defined value and, if so, what the domain of legal values is that the attribute will support. If a definite value isn't necessary, the column definition can allow NULL values. You implement these decisions by defining the attribute's data type and nullability (whether that column accepts NULL values). Nullability is an important consideration, so don't leave the determination of nullability to the current session settings or database settings. Always describe nullability in the table definition after giving the subject serious thought.

After you define the entities and corresponding attributes in your database, you populate the table with data; then you'll probably need to display that data in reports. Entities are a database's core components; attributes and relationships give an entity detail and depth and describe its associations with other entities. If a relational database has minimized redundancy by creating many entity tables, you use joins to create a picture of the data and its relationships.

Let's review a sample database called TSQLTutorJoins (see "Obtaining the Code," page 3, for download instructions for the creation file). This database has four entities: Customer, Product, Category, and Order. To minimize the join results, each table has only a few attributes and only a few rows. These limitations let you focus solely on the join results and not get lost in large resultsets and complex attributes.

Also, I defined primary key and foreign key constraints on each table to enforce the relationships between the tables. This enforcement, declarative referential integrity (DRI), significantly helps maintain and enforce accuracy in the relationships between entities when the database processes additional INSERT, UPDATE, and DELETE statements.



ARTICLE TOOLS

Comments
  • Cheril Rinebarger
    11 years ago
    Sep 12, 2001

    None of the links seem to be working. It will not let me see any of the figures or download the code as indicated in the article. This is frustrating.

You must log on before posting a comment.

Are you a new visitor? Register Here