Automating Replication Validation and Setup Alerts
The mechanism you use to automate the validation process for a transactional publication is different from the mechanism you use for a merge publication. Also, in the case of a merge publication, you use a different mechanism based on whether the merge agent is configured to run continuously or only at scheduled times. For a transactional publication, you simply create a job that runs sp_validate_publication on a scheduled basis; for example, every night at 1:00 a.m. For a merge publication in which you configure the merge agent to run continuously, you set the ValidateInterval parameter to determine how often, in minutes, the process should validate the subscriber (the default is 60 minutes). For example, to validate the subscriber every 24 hours, set validate to 1 or 2 and ValidateInterval to 1440 minutes.
For a merge publication in which the merge agent is configured to run on a scheduled basis, such as every 5 minutes, you need to create a job that will change the validate parameter to 1 or 2 at the time when you want the validation to occur. For example, you can create a job that runs every midnight and uses the sp_change_agent_parameter procedure to change the validate parameter to 1 or 2. Then you can create another job to set the validate parameter back to 0 after the validation is complete. You can set up the second job to be triggered by the 20574 or 20575 alert, which fires when the validation is complete.
To automate a response to the validation check, you need to enable the preconfigured alert for message ID 20574 (failure) and message ID 20575 (success). You can configure these alerts through the Replication Monitor in Enterprise Manager. The alerts can take an action such as notifying an operator or recording a message in a log.
Validating Bidirectional Transactional Replication
Validating bidirectional transactional replication scenarios, including scenarios that use immediate-updating subscribers, is more complicated than validating ordinary transactional publications because users or processes are potentially updating the publisher and subscriber at the same time. To get an accurate validation check, you need to stop all users from making modifications directly against the subscriber while the validation check is performed. Users can still make modifications against the publisher and those modifications will be replicated to the subscriber while the validation is performed. You can use the 20574 and 20575 alerts to notify you when the validation check is complete.
Execution Times and Blocking
The validation process takes out a shared lock while it calculates the rowcount and checksum for a given table, but the calculation doesn't take long to run even on a fairly large table. I measured the execution time to run sp_table_validation on tables of different sizes. I performed the tests on a system with two Intel 333MHz Pentium II processors, 500MB of RAM, and one 9GB SCSI hard drive. Table 1 shows the test results. Note that it took only 30 seconds to calculate a rowcount and checksum against a 5-million-row (250MB) table in which the data was not preloaded into the data cache.
Limitations to Replication Validation
Here are some limitations to keep in mind when you use replication validation.
- The validation process excludes text and image columns when calculating checksum values. You can still run the validation on tables with text and image columns, but the checksum will be calculated only for the data in nontext and nonimage columns.
- So that the validation process can calculate a valid checksum, the table needs to have the identical structure at the publisher and the subscriberthe same columns in the same order, the same data types and lengths, and the same NULL/NOT NULL property.
- You can't use checksum validation with vertically filtered articles because the subscriber has only a subset of the publisher's columns, which results in different checksum values.
- Floating-point values can cause checksum differences if you use character-mode bulk copy program (bcp) to synchronize the subscriber because small, unavoidable differences in precision occur in the conversion between a floating-point number and the number's character-string representation. You can use the native-mode bcp to avoid discrepancies, or you can use a numeric or decimal data type instead of floating-point values. You use character-mode bcp whenever the publication has heterogeneous subscribers.
- To ensure that checksum values are accurate, you need to create the table at the publisher with one CREATE TABLE command. The reason is that the internal structure of the table might be slightly different depending on whether a user created the table with one CREATE TABLE statement or a CREATE TABLE command and a series of ALTER TABLE statements. The algorithm the validation process uses to calculate the checksum is sensitive to these internal differences, so it might return different checksum values for the publisher and subscriber even when the data is identical. Because the automatic synchronization process always uses one CREATE TABLE statement to create the table on the subscriber, the simplest solution is to also create the table at the publisher with one CREATE TABLE statement. (If you want to verify that the internal table structures at the publisher and subscriber are identical, you can compare the offset column values for each table in the syscolumns system table.)
Replicating Success
Most serious SQL Server replication problems occur when the subscriber gets out of sync with the publisher. The users might see inaccurate data, or the replication process might fail because the data is out of sync. For example, the distribution agent can fail, causing a duplicate key error. By becoming familiar with SQL Server 7.0's inline replication validation capability, you can detect synchronization problems before they wreak havoc in a production environment.
End of Article
Prev. page
1
[2]
next page -->