• subscribe
July 19, 2005 12:00 AM

Merge Replication in the Real World

Create a high-availability and disaster-recovery solution that’s easy to configure and support
SQL Server Pro
InstantDoc ID #46828

As a senior consultant with the Financial Services Group of Microsoft Consulting Services, I help clients in the financial industry efficiently use Microsoft database technologies. I recently met with a client who wanted a high-availability and disaster-recovery solution. The client company's DBA had no practical experience supporting SQL Server high-availability solutions (including replication), so the client wanted a solution that would be simple to configure and support. I'll give you some background about this case and show you the steps I used to create a solution that doesn't require extensive scripting skills. If you're looking for an easy-to-use solution for your organization's high-availability and disaster-recovery needs, taking a look at this real-world example can help you decide whether merge replication is a good choice for you.

Client Requirements
The client I worked with had two data-processing centers located miles apart and linked by a reliable T2 network connection line between the sites. The client needed a flexible high availability solution that would allow the addition of a third site in the future with minimum effort. Each existing site hosted a SQL Server 2000 database on a Windows 2000 server. Both Windows servers had two processors and 2GB of RAM. The client's database was about 25GB and growing slowly (200MB to 500MB per month). The database had a relatively low load (10 to 50 transactions per minute). The client's application is in use mainly from 7 a.m. to 9 p.m., 5 days a week, and the database needed be available at both sites at all times during business hours.

The client wanted to do data processing at one site and reporting at the other site. In case of disaster at one data center, the client wanted a failover system so that the remaining data center could quickly combine these functions without any reconfiguration. The client had considered splitting data-load operations between data centers but knew that data operators might try to update or delete the same record simultaneously on both sites. To resolve such possible conflicts between the two sites, the client wanted to merge changes that users performed on different sites in different columns, if possible. For example, if a user at SiteA changes column1 and another user at SiteB modifies column2, the client wants to merge both changes in the resulting table. Some tables have one or more text columns. Several tables have IDENTITY columns.

So, from these requirements, the client and I determined that the solution they needed should provide continuous data transfer, site autonomy (so that operations at one site will only minimally affect operations at the other site), processing of changes at both sites, and resolution of data conflicts.

Why Choose Merge Replication?
SQL Server 2000 offers several options that keep databases synchronized close to realtime, including two-phase commit, transactional replication with immediate updating subscribers, snapshot replication with immediate updating subscribers, transactional replication (including bidirectional replication), snapshot replication, transactional replication with queued updating subscribers, and merge replication. Based on the client's requirements, I recommended using SQL Server 2000 merge replication for several reasons. First, merge replication has a robust, built-in conflict-resolution mechanism that can be easily configured using Enterprise Manager. In simple cases, no scripting is required. Merge replication provides an out-the-box mechanism for resolving conflicts at the column level and a high transactional consistency of operations in situations like the one my client described, in which conflicts are rare and only two sites are currently involved in data processing. In addition, merge replication allows replication of text data, which was important to my client because the company had more than 10 database tables that contained text columns. Merge replication includes continuously-running Merge Agents that provide data transmission with acceptable latency. And using merge replication makes adding another server to the existing topology easy.

The other alternatives have their own benefits, but I couldn't recommend using immediate updating and queued updating subscriptions because those options can't update text data. Not all types of transactional replication can merge simultaneous changes in different columns on different sites. Snapshot replication and DTS simply don't work in an environment in which data modifications happen on both sides simultaneously. And two-phase commit is a rather theoretical solution because even the loss of network connectivity for a short time during a transaction would effectively shut down the application. Once I'd decided on the high-availability implementation to use for my client, I prepared the database for merge replication.

Prepare for Merge Replication
Before configuring merge replication, you must first carefully analyze your application and address specific merge-replication requirements. Because this brief article can't provide all the planning and preparation details, I recommend that you read the "Planning for Merge Replication" section in SQL Server 2000 Books Online (BOL) for details. Here are some of the things I had to do for my client's solution.

Merge replication replicates text columns only if they've been updated explicitly by an UPDATE statement, which causes a trigger to fire and update metadata, ensuring that the transaction gets propagated to Subscribers. WRITETEXT and UPDATETEXT operations don't propagate the change to other sites. To solve this problem, I modified a few stored procedures by adding a dummy UPDATE statement after the WRITETEXT or UPDATETEXT operations within the same transaction. BOL gives an example of how to do this type of modification.

To avoid recurring conflicts during merge replication, I configured all foreign key constraints and user-defined triggers (UDTs) with the NOT FOR REPLICATION option. To perform this configuration, select a table from Enterprise Manager, right-click, and select Design Table. Click Manage Relationships, then clear the Enforce relationship for replication check box on the Relationships tab of the Properties screen, as Figure 1 shows; repeat this step for each foreign key in the drop-down list. Next, click Close, then click Save. Click Yes when the confirmation window appears. Close the Design Table window, and repeat these steps for all tables that have foreign keys.

To change the NOT FOR REPLICATION option for the existing trigger, select a table in Enterprise Manager. Right-click, select All Tasks, then select Manage Triggers. From the drop-down list, select a user-defined trigger. As Figure 2 shows, type NOT FOR REPLICATION on the line preceding AS in the text box. Click Apply, then repeat these steps for other triggers in the same table. Finally click OK. You'll need to change the NOT FOR REPLICATION option for all tables that have triggers.

All IDENTITY columns must have the NOT FOR REPLICATION option. (This option is configured automatically when you set up merge replication.) IDENTITY values need to be partitioned by site. When you configure merge replication, to ensure that the identity values being assigned are within those allowed for the site range, SQL Server automatically creates CHECK constraints in each table that has an IDENTITY column. For my client's solution, I carefully planned the range for the IDENTITY column for each affected table. I recommend that you assign a high range of values for the site so that it is practically impossible to reach the limit. Because of some problems with this functionality, don't rely on the automatic identity range handling set by SQL Server. (For an explanation of these identity-range problems, see the Microsoft article "BUG: Identity Range Not Adjusted on Publisher When Merge Agent Runs Continuously" at http://support.microsoft.com/?id=304706.)

I decided to use a default mechanism for conflict resolution (in which the Publisher always wins). This is the simplest variant and, in this case with only two servers and rare conflicts, choosing a default variant was appropriate. Implementing a datetime-dependent conflict resolvers would require modifications to the table structure of many tables. Generally, SQL Server lets you choose from several predefined types of conflict resolvers, or you can write your own conflict resolver. In most cases, I'm sure you'll be satisfied with some type of predefined conflict resolver.

By design, merge replication adds a uniqueidentifier column to each replicated table. The total size of the row for each replicated table, including the one I added, should not exceed 6000 bytes. I worked with the client to change the structure of a few tables that had a larger row size.

SQL Server requires that all tables connected by foreign-key relationships be published together, in one publication. Before configuring merge replication, I created a database diagram for ClientDB in Enterprise Manager and identified all such cases so that I could see the relationships overall. Then, I prepared a spreadsheet with a list of my planned publications and articles to make configuration easier.

Design of Merge Replication
Figure 3 shows the overall design of merge replication for my client. I chose SQLServer1 at Site 1 as Publisher and Distributor. I specified SQLServer2 at Site2 as Subscriber. ClientDB is the database to be replicated. The Snapshot and Merge Agents implement merge replication. The Snapshot Agent prepares snapshot files that contain the schema and data of published tables, stores the files in a snapshot folder, then inserts synchronization jobs into the publication database. The Snapshot Agent also creates replication-specific stored procedures, triggers, and system tables. The Merge Agent merges incremental data changes that occur at the Publisher or Subscribers after the initial snapshot is created, and it reconciles conflicts according to defined rules.

The role of the Distributor is limited in merge replication, so it's common to implement the Distributor locally (on the same server as the Publisher), which is what I did in this solution. The distribution database on the Distributor stores only history and miscellaneous information (e.g., errors) from both servers about merge replication.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here