To partition a production database, you need several hours of downtime and lots of disk space on both the local server and a network file server. A high transfer rate on your network (e.g., a gigabit connection) speeds the process. Tom got another server to house the new database (ArchiveDB) and contain the aged data. If a new server isn't in your budget, you can create ArchiveDB as a second database on your production server. To partition your database, you can follow the steps we used:
- Create an archive database to hold the aged data. If necessary, create a different filegroup structure, separating system from user data, data tables from nonclustered indexes, and tabular data from text, ntext, and image data. The code in Listing 1 is similar to the code I used to create ArchiveDB and establish filegroups for the new database.
- Take the production database offline, and make a full backup.
- Detach the production database, copy it, and store the copy on the local server. Then, make a second copy, and store it on the network file server for safekeeping.
- Reattach the production database.
- Set logging on both the production database and ArchiveDB to bulk-logged.
- Use Data Transformation Services (DTS) to copy data from the production database to ArchiveDB. This method was the fastest way for us to copy data from one filegroup schema to another.
- Run scripts to delete the current data from ArchiveDB.
- Run scripts to delete aged data from the production database.
- Run count scripts to confirm record counts and be sure data doesn't overlap.
- Reset databases to full logging.
- Back up both databases, and back up the master database on each server.
- Bring both databases back online.
The first time we ran this process, the record counts didn't match, and we realized we'd lost data in the partitioning process. We found an error in one of the algorithms we used to calculate when a record is old enough to archive. We needed more time to test the corrected algorithm, so we decided to fall back to the original production database. In only 1 minute, we detached the newly partitioned production database and attached the original production database from the local-server copy we made at the beginning of the process. The database went back into production almost immediately with no data loss or operational errors.
The next day, we corrected and tested the troublesome algorithm. That night, we went through the partitioning process again, and this time, the record counts were correct. The process ran so smoothly that we had time to modify the reports that used historical data to reflect the new partitioned architecture. We went into production the next morning with no data loss, no data errors, and better performance. The production database, which had previously been nearly 5GB, was now a trim 500MB. The invoicing job, which had previously taken half a day to run, finished in half an hour, limited only by the speed of the printer.
This type of database partitioning is called horizontal partitioning. We used a complex date algorithm to determine how to partition the data. To maintain the division between the current data and the aged data. I created a set of programs that run as a scheduled job. These programs search the tables in the production database, find records to archive, copy the records to ArchiveDB, then delete them from the production database. This scheme ensures a minimal load on the production server because you can schedule the job to run during off-peak times. We didn't implement distributed partitioned views for data entry because data entry happens only in the current database and because identifying and creating partitioning columns would be hard. (For information about distributed partitioned views, see Kalen Delaney and Itzik Ben-Gan's articles "Distributed Partitioned Views," August 2000, InstantDoc ID 9086; "Querying Distributed Partitioned Views," September 2000, InstantDoc ID 9097; and "Modifying Views with INSTEAD OF Triggers," October 2000, InstantDoc ID 9734.)
A Happy Result
We delivered a scalable solution that gave Tom's production database a new lease on life by trimming it down to a manageable size for his server hardware. Tom's boss was happy, users could use the new databases efficiently, and Tom has scheduled a replacement for the client interface in the next year. The old data that's housed in ArchiveDB will become the company's data warehouse.
Crafting a good solution for a poorly performing database doesn't always mean modifying table designs, tweaking indexes, upgrading hardware, or enforcing relationships. In a production environment, you might not be able to make these kinds of changes because of the effect they might have on operations. Instead, you might have to explore alternatives such as database partitioning.