Table Design
Tables can be "flat" (too denormalized), similar to a spreadsheet containing
lots of redundant data. Likewise, tables can be overly separated (overly normalized),
requiring references to dozens of tables to access the information you need.
Tables can be tall (in terms of the number of rows).Tables can be wide (in terms
of the number of columns and column width).You'll experience problems with all
of these types of tables and each type can provide benefits.To resolve some
of the most common dilemmas about table design, follow this sequence of questions:
- Is your table too denormalized or normalized?
- Do you have only one table?
- Do you have a lot of redundant data?
- Do you find that many attributes have exactly the same values for multiple
columns and multiple rows?
Is your table overly denormalized? Having some redundant data
can be beneficial. However, to minimize the number of tables joined, the only
data that should be redundant should be relatively static in nature and as narrow
as possible. Ideally, I would just consider making some relationships redundant
and not necessarily making detailed attributes redundant. For example, imagine
a LineItems table that describes the items in each customer order. Each LineItem
describes a product sold, and each product has a manufacturer. In your design,
you've made LineItems, Products, and Manufacturers separate tables.You reference
the ProductID in LineItems and the ManufacturerID in Products. However, when
you do some routine analysis, you find that you're generally aggregating sales
by manufacturer so that you can stock and sell more items from Manufacturers
whose products sell more. If you add the ManufacturerID to the LineItem table,
you'd be adding an extra column, but you'd completely eliminate the join to
Products when you're doing your analysis. Additionally, even when attributes
from the Products table (e.g., ProductType) are necessary, the optimizer has
more options for how to process the three-table join when all join conditions
are stated than when only the two possible join paths are available.
Is your table overly normalized? If every query requires multiple joins to get basic attribute information, you might have gone too far. Purists will argue that using an external code as a data key is a mistake because such keys are outside of the control of the system and might change.This argument is especially true in a large company in which different departments consume each other's data. However, ISO "country codes," for example, and other standard coding systems are safe to use directly because it's more likely that any change will affect the system, regardless of the original design. To learn more about database normalization, see Related Reading.
Is your table too tall? Having a lot of rows isn't necessarily
a problem.With appropriate indexes and targeted queries, you can easily evaluate
large datasets. (Well, you can if you have decent disks and lots of memory.
Definitely target 64-bit machines if you're a VLDB data warehouse.) However,
the most important clues for determining whether a table is too large are administration,
maintenance, and management—not necessarily query performance. Really,
it goes back to how the data is used. If the data follows a consistent pattern
in loading or archiving (e.g., every week, month, or quarter, data is loaded
from one or more OLTP systems), then you have a fundamental problem in efficiently
managing that load and archive. The problem is so common that it's got multiple
names: the sliding window scenario or the rolling range scenario. Loading a
large amount of data into an existing table can be expensive, mostly in terms
of the effect on the secondary non-clustered indexes.
Data loads are slow and result in a tremendous amount of fragmentation.As an alternative in some design systems, the indexes are dropped, the data is loaded, then indexes are added back. However, this technique is unrealistic if the existing table has 87 millions rows (2 years worth of data) and you're bringing in only 3 million rows from the last month's worth of activity. So, how do you make the load faster and keep the table intact and contiguous? Consider horizontally partitioning the data so that you can have more granular control of your data.
Is your table too wide? This important point leads us into the next section about row size and page density. Row size matters: It directly affects query performance and I/O characteristics. A rule of thumb is that having more attributes isn't necessarily better than having fewer attributes. Certain attributes might even reduce the possibility of online index operations. (To learn about table partitioning, see Related Reading.)
Row Size and Page Density
Without spending a tremendous amount of time covering internals, I want to convey
a few key things you can learn from knowing a bit about the on-disk structures
of your tables. Generally speaking, rows don't span pages. Specifically in SQL
Server 2005, rows can span pages but only when the rows include limited variable
columns. Don't confuse these columns with the new max types or LOB data; these
are the NON-LOB data types such as varchar(n), nvarchar(n) and varbinary(n),
in which each column width is less than or equal to 8000 bytes. Having said
that, I want to emphasize that although SQL Server 2005 allows a 10K row, for
example, you shouldn't necessarily have one. Internally, this row will consist
of an "in_row" structure as well as an "overflow" structure. Rows that are less
than 8K will be stored entirely as "in_row" structures, and the "in_row" structure
of a row can't span pages. For example, if your "in_row" row structure averages
4,500 bytes per row, you can fit only one row per page. This limitation results
in roughly 3,596 wasted bytes both on disk and in memory. If the attributes
that make up the 4,500-byte aren't required for all requests, consider breaking
this data into two or even three tables based on column usage. You might end
up with more densely packed pages, less locking and contention, and better overall
resource utilization. To learn more about row size and physical database storage,
see Related Reading.
Prev. page
1
[2]
3
4
next page