I work in a SQL Server production environment that consists of multiple customer databases on multiple SQL Server 2008 64-bit instances. One database
is maintained for each customer. Every week each customer provides a file that contains its sales data for the previous week. An automated process
loads this data directly into a staging table with few validations. One task that I need to perform every week is validating and loading this sales
data from the staging table to a single Sales table in the same customer database. For the large customer databases, a week's worth of data ranges
between 20 million and 40 million records.
Another task I need to perform every week is purging one or more weeks of old data from the Sales table because my company's data retention policy is
to keep only 105 or 156 weeks (about 2 or 3 years) of sales data, depending on the customer. I purge the old sales data the same day I load the new
sales data.
Loading and purging the data took a lot of time each week because the Sales table wasn't partitioned. So, I decided to create an elegant partition
solution that would load the new sales data and purge the old sales data as quickly as possible. I came up with four-stage plan:
1. Identify the best partition solution.
2. Implement the partition solution.
3. Migrate the data.
4. Set up the purge process.
Stage 1: Identify the Best Partition Solution
My goal for stage 1 was to determine the best partition solution. After some contemplation, I decided on the following:
-
Each partition in the new partitioned table would hold one week's worth of sales data, making it easy to add new data and purge old data.
-
The ideal column to define the partitions would be the SalesDate column in the old nonpartitioned table, because this column contained the date
the records were inserted.
-
I would dynamically create the initial partition and transfer data from the nonpartitioned table to the partitioned table. I would also
dynamically create subsequent partitions.
-
Because only 105 or 156 weeks of sales data can be kept in the partitioned table, I would dynamically delete partitions containing data older
than that. The partition numbers would always be 1 to 105 or 1 to 156, with partition 1 containing the oldest data.
Stage 2: Implement the Partition Solution
My goal for stage 2 was to create a new table that retained the definition of the nonpartitioned Sales table but used dynamically generated partitions.
Here's how I created that table in each customer database:
1. I dropped all the primary-key, unique-key, and foreign-key constraints and all indexes in the nonpartitioned Sales table.
2. I renamed the nonpartitioned Sales table to SalesNormalTable_B4_Partition.
3. I declared a local variable (@tmpSalesWeekDate), obtained the oldest sales date from the SalesDate column in the SalesNormalTable_B4_Partition
table, and stored it in that variable, using code like that in Listing 1.
4. To dynamically generate the partitions, I created a partition function named PF_dbo_SalesPartitionTable_SalesDate_ByWeek. As Listing 2 shows, I used
the sales date in the local variable as the first value so that the first partition would hold the oldest sales data from the
SalesNormalTable_B4_Partition table.
5. To dynamically generate the partition schemes in the new partitions, I created a new partition scheme named
PS_dbo_SalesPartitionTable_SalesDate_ByWeek for the PF_dbo_SalesPartitionTable_SalesDate_ByWeek function, as Listing 3 shows.
6. Using the newly created partition scheme, I created a new partitioned Sales table named SalesPartitionTable and a temporary placeholder table, both
of which had the same definition as the SalesNormalTable_B4_Partition table. The placeholder table would be used during the initial data transfer and
during the subsequent weekly data transfers.
7. I created the necessary primary key and indexes in the SalesPartitionTable and placeholder tables.