Traditional backup-and-restore technology wouldn't work either because it would replace the entire Subscriber database with the Publisher's information, which in the case of the reporting server would be inappropriate. The bank's reporting servers contain only a subset of tables replicated from the OLTP servers, along with additional tables for reporting. Another reason I couldn't use a typical restore from backup is that such a restore requires exclusive access to the Subscriber. If the Publisher accepted new data during the restore, the data would immediately be out of sync again. But the bank couldn't afford to shut down the Publisher.
We also couldn't use the DTS Import/Export Wizard. The wizard gives you the option of using a query to specify the data to transfer so that you can copy to the Subscriber a subset of the Publisher's data. But you have to recreate the destination tables or DTS will append the data you're restoring to the existing tables. Appending the rows might create duplicate records on the Subscriber if your query doesn't exactly define all the missing rows. In addition, with DTS, you need to define a query for each table, which could be a big task if you have to synchronize many tables. And the task becomes even less convenient if you have several Subscribers (as our bank has) and the wizard must run several times. The bank needed a new approach.
As a solution to the problem, I created and successfully implemented a methodology I call forced replication. This technique can synchronize the data between selected articles without stopping online operations on the Publisher and Subscribers, and you can schedule forced replication to run any time. You can apply this methodology to any database system that uses transactional replication, but it especially benefits applications that work with large databases (100GB or more) and that would require too much time for data synchronization with snapshots or other backup-and-restore methods.
What Is Forced Replication?
In solving the bank's replication problem, I found that the best tool for synchronizing data is replication itself, so I developed forced replication to synchronize on demand only the parts of the database that need updating. This technique is fast and doesn't disrupt user access to data. To use forced replication after a disaster occurs, I first verify that both the Publisher and the Subscriber are available again and the system's replication configuration is restored. For example, if a Distributor failed, I would make sure it's back online. Then, at a time that won't interfere with business processes, I run commands that resend the parts of the data that have changed since the last time the data was synchronized. These commands either insert missing data or replace existing data without duplicating it.
The forced-replication technique is based on default SQL Server replication behavior. SQL Server replicates updates that are issued against any column from the table's primary key (except an IDENTITY column) as a combination of DELETE and INSERT commands. SQL Server replicates updates against any other table columns simply as UPDATE commands. Let's look at how this behavior works.
Say the Pubs database includes a publication called Test that contains just one article, Table1. Listing 1 shows the code that creates Table1, which has a primary key defined on the integer column f1, the varchar column f2, and the datetime column f3. You want to synchronize Table1's data on the Publisher and all Subscribers. (Listing 2's code creates the Test publication, the Table1 article, and a subscription.) Assume that the Distributor, Publisher, and Subscriber servers are already configured. Subscribers might have different numbers of records or different data in some columns because they aren't synchronized yet.
If you run on the Publisher the following simple UPDATE command
UPDATE Table1 SET f1=f1
WHERE f3 < '01/03/2004'
SQL Server doesn't change the existing data on the Publisher server but instead delivers the command to the Subscribers as two new stored procedure calls for DELETE and INSERT:
{CALL sp_MSdel_Table1 (1)}
{CALL sp_MSins_Table1 (1, 'AA', '01/03/2004')}
Listing 3 shows the code that creates these stored procedures, which are similar to those that SQL Server 2000 automatically generates if you use a publication creation wizard instead of running Listing 2's scripts to configure the Test publication.
If a record that you're replicating already exists on the Subscriber, the stored procedures will delete and immediately replace the existing record. If a record is missing, the stored procedures will insert it where it belongs in Table1. Note in Listing 3 that the custom stored procedure sp_MSdel_Table1 doesn't generate an error message if it doesn't find a corresponding record on the Subscriber (as SQL Server does by default); I removed that part of the code from the system-generated stored procedure to avoid creating an error message that would stop the Distribution Agent. This change is the only difference between the system-generated stored procedures and the custom ones I use for forced replication.
To view how the Distributor delivers commands to the Subscriber, stop the Distribution Agent in Replication Monitor, as Figure 2 shows. Next, run the preceding UPDATE command against the Table1 article again. Then, on the Distribution server, run the following command in Query Analyzer:
EXEC distribution.dbo.
sp_browsereplcmds
As Figure 3 shows, SQL Server transforms the UPDATE command on the Publisher into two CALL commandsone for a deletion and one for an insertionon the Distributor.
Scenarios for Using Forced Replication
You can use forced replication to synchronize data from the Publisher to the Subscriber or from the Subscriber to the Publisher. In most cases, when data is out of sync, it's because the Publisher has more data than the Subscriberin other words, INSERT or UPDATE commands on the Publisher haven't been propagated to the Subscriber. To synchronize the data, you simply need to run on the Publisher server an UPDATE command against one of the primary key columns, replacing the column's value with an identical value. The UPDATE command will either delete and reinsert the records without duplicating them or insert missing records on the corresponding Subscriber tables. You can use a WHERE clause to restrict the amount of affected data. For example, as Figure 3 shows, you can update only records that occurred before January 3, 2004 (f3 < '01/03/2004').
Prev. page
1
[2]
3
next page