Executive Summary:
A horizontally partitioned fact table’s rows are divided into discreet, non-overlapping sections. These sections are defined by a range of values in the FOR VALUES clause. To horizontally partition a fact table you need to create a partition function, a partition scheme, and a partitioned table.
|
Fact tables tend to grow very large, very fast.
Sometimes, a fact table can become so large
that it can be difficult to manage, and SQL
queries can be adversely affected by the sheer size of
the data set involved. However, you can horizontally
partition a fact table to make it easier to work with.
A horizontally partitioned table is one in which
the rows are divided into discreet, non-overlapping
sections. Each section in a horizontally partitioned
table is defined by a range of values, such as by date,
geographic area, or customers’ last names. (Note: You
might consider storing each section on a different hard
disk to enhance query performance.) Although it isn’t
difficult to horizontally partition a table, it does require
some advance planning because you’re dealing with
large amounts of data. Let’s explore when and why
you’d want to horizontally partition a fact table, and
how to create a partition function, a partition scheme,
and a partitioned table using SQL Server 2005’s built-in
functionality.
Why Partition a Fact Table?
Large tables (i.e., tables with hundreds of millions
of rows) can be difficult to manage because of their
size and the amount of time it takes to do anything
with them (e.g., rebuild an index). In a transactional
database, the associative tables (i.e., those tables that
involve the many to many—M:N—relationship) are
often the tables with the most rows in the database.
In dimensional modeling, a fact table is the equivalent
of an associative table. Like an associative table in a
transactional database, a fact table often has many
more rows than its related dimensions, perhaps even
as many as (# of rows in dimension 1) x (# of rows in
dimension 2) … x (# of rows in dimension n) rows.
Partitioning breaks these monster tables into manageable
chunks. If your maintenance time windows
are shrinking or the amount of data to be processed
is growing, you can partition the table and perform
tasks, such as backup and restore operations or use
the Database Contingency Checker (DBCC) to update
table statistics, by partition instead of for the entire
table. SQL Server 2005 treats the many sections of a
partitioned table as a single logical entity, and the multiple
partitions appear as a single table to end users.
The following are some reasons to horizontally
partition a table:
- You can better control where each partition is
placed in storage and leverage multiple read/write
heads for fast query resolution.
- You can back up and restore by partition, indexes
can be rebuilt and reorganized by partition, and the
indexes themselves can be partitioned.
- You can direct queries that include a WHERE
clause that contains either the partitioning column
or an indexed column to the appropriate partition
for resolution.
- You can reduce lock escalations and lock-management
overhead because locking is limited to
partitions.
- You can merge or split partitions fairly easily if multiple
partitions are in the same file group.
So which tables are the best candidates for horizontal
partitioning? Very large tables, tables that you expect to
grow very large in the near future, and tables that can be
intuitively partitioned based on their business value (e.g.,
by fiscal year). These tables must include a column whose
values are NOT NULL and that can be used to divide
the rows into discreet, non-overlapping sections such as
a column containing sales dates.
If your database contains a large table against
which queries and updates aren’t performing the way
you think they should, consider testing how partitioning
might affect query performance in your environment.
SQL Server 2005 is partition-aware, meaning
that if slow-running queries include a WHERE clause
that contains the partitioning column or the indexed column (and the index is also partitioned),
only the relevant partition
is accessed for query resolution.
This functionality can significantly
help performance.
Creating a
Partition Function
To partition a table, you need to
use a function that’s composed of a
partitioning column and a set of boundaries. To create
the partition function for the SALES fact table shown
in Figure 1, you’d run the command
CREATE PARTITION FUNCTION
MyPartitionFunctionLeft
(datetime)
AS RANGE LEFT
FOR VALUES (‘1/01/2003’,
‘1/01/2005’, ‘1/01/2007)
MyPartitionFunctionLeft is the name of the partitioning
function, (datetime) is the data type of the
partitioning column, and RANGE LEFT stipulates
how to divide up the data values that are bound by the
FOR VALUES dates.
There’s two ways to set the boundaries: RANGE
LEFT or RANGE RIGHT. The RANGE LEFT clause
divides the data from the lowest value to the highest value
(i.e., in ascending order). The RANGE RIGHT clause
divides the data from the highest value to the lowest value
(i.e., in descending order).
The partitioning column is often a datetime data type
such as the Date_of_Event column (shown in Figure 1).
Separating datetime records into non-overlapping groups
is straightforward. For example, if your business rules
and known operational queries indicate that partitioning
the table on the date of a sales event is reasonable, then
you could partition the data into two-year groupings, as
I did in the previous partition function command. Partitioning
RANGE LEFT divides the data into the value
ranges shown in Figure 2.
If the date of a sales event was June 23, 2004, you’d
find that record in partition 2 (P2). If you want to
create the partition function with the RANGE RIGHT
clause, you’d run the command
CREATE PARTITION FUNCTION
MyPartitionFunction datetime
AS RANGE RIGHT
FOR VALUES (‘1/01/2003’,
‘1/01/2005’, ‘1/01/2007)
Partioning RANGE RIGHT divides the data into the
valve ranges shown in Figure 3. I recommend, for the
sake of consistency and for the ease of querying, that
you choose one range declaration (i.e., RANGE LEFT
or RANGE RIGHT) and stick with it throughout
all the partitioned tables that you create in your
environment.
Continued on page 2