DOWNLOAD THE CODE:
Download the Code 44074.zip

After the World Trade Center attacks of September 11, 2001, many organizations reconsidered how they manage and protect crucial business data. Having a remote data center with a second set of databases and application servers has become a common practice. Businesses have been willing to accept the cost of supporting additional personnel, data centers, and changes in application design because they want to be able to switch operations to a different location quickly in the event of a disaster. For example, after the September 11 attacks, a large bank where I recently worked as a consultant with Microsoft Consulting Services began requiring that all crucial applications be able to function from either of two locations hundreds of miles apart with no more than 2 hours downtime.

These tighter business requirements are a challenge to database architects. Most organizations' databases constantly grow, but maintenance and downtime windows are kept at a minimum. Databases must be highly available and ready to function in distributed environments, and data in primary (Publisher) and secondary (Subscriber) databases must always be synchronized.

Several solutions support high availability for SQL Server 2000 databases in different locations, including log shipping, third-party hardware or software solutions such as geographical clusters or EMC's SRDF remote-storage replication solution, and transactional replication. In most high-availability solutions, the Subscriber is completely or partially unavailable during the data-synchronization process. For example, in log shipping, the Subscriber database is available for reading only when new transaction logs aren't being applied. In the expensive SRDF hardware solution, the Subscriber database is unavailable all the time; it's available only for restoring data if the Publisher database is down. Only transactional replication allows full use of the Subscriber database all the time. This solution is available in all SQL Server editions, so you can avoid extra expenditures on third-party software and still get the maximum use of the hardware you already have. That's why many companies choose transactional replication as their high-availability solution.

But transactional replication doesn't automatically solve the problem of data synchronization; it requires efficient business processes and a highly skilled DBA team to create and support the solution. Let's look at the high-availability problem the bank I worked with faced in implementing transactional replication and continuous data synchronization and see how I solved the problem by using a new method I call forced replication.

Confronting the Problem
Using replication for very large databases (VLDBs) isn't simple when a business requires the ability to shift operations to a secondary site on short notice. Database personnel must be sure that data is synchronized all the time, and they usually have only a small window of time to fix any problems that arise in a disaster situation.

Several possible scenarios in transactional replication might cause data on the Publisher and Subscriber to become out of sync, which could create problems for the business if a disaster occurred. For example, say the remote Distributor (the server that stores metadata and history data and temporarily stores replicated transactions) crashes and, for business reasons, the company can't immediately stop accepting new data on the Publisher (as is the case with our bank). The Publisher will continue to accumulate new data, and the databases on the Publisher and Subscriber will become out of sync. Or, a subscription might expire for some reason—for example, failure of a replication agent—and changes on the Publisher wouldn't be propagated to the Subscriber. Or, a user might inadvertently—or maliciously—run a DELETE command on the Subscriber and remove already propagated data. In such a case, the Subscriber would have fewer records than the Publisher. If you've used transactional replication, you've probably faced this situation at least once.

To avoid having unsynchronized data, the bank asked me to develop a mechanism of online data synchronization for the banking system that replicates two VLDBs with a combined size of approximately 500GB from a server in New York to a server in Delaware, as Figure 1 shows. Additionally, each site has reporting servers that contain identical reporting data—a subset of the data from the VLDBs. In an emergency, the bank has to be able to easily shift its data load from the New York server to the Delaware server and reverse the replication data flow between the two OLTP servers (i.e., from the Delaware server to the New York server). In the bank's original replication plan, a scheduled job runs every hour to compare the number of records in corresponding tables on the Publisher and Subscribers. Of course, the job algorithm takes delivery latency into account. If the job finds an inconsistency in the databases, the bank needs to immediately synchronize the data.

To accomplish this quick synchronization, I couldn't use conventional techniques such as applying publication snapshots, traditional backup and restore, or Data Transformation Services (DTS). Because a snapshot requires that you copy all the data, not just the part you need, preparing and applying snapshots for large databases by using SQL Server Enterprise Manager's Snapshot Agent can take hours or even days, and the bank doesn't have the luxury of time. The bank needs to synchronize only the data that has changed since the last time the scheduled job ran successfully—just a few hours of new information. Preparing a special differential snapshot of the affected tables, which have hundreds of thousands of rows, would be a waste of time and resources.

   Prev. page   [1] 2 3     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

very though provoking

blueoyester2000

Article Rating 4 out of 5

It looks just like transactional replication under SQL Server 6.5.

Anonymous User

Highly useful. Great Job!

mspalding

Article Rating 5 out of 5

Great article i think.

sanjeev1810

Article Rating 4 out of 5