Although log shipping is a component of a high-availability solution, it's missing an essential feature. Log shipping doesn't have a mechanism for automatically detecting failure and initiating a failover to the secondary server. Log shipping also has a lag time during the failover when all backups are applied to the secondary server before it can be recovered and made available. Because log shipping relies on copies of the backups from the primary server, you risk losing some committed transactions if your primary server has a complete hardware failure. With failover clustering, once SQL Server is back online, clients can reconnect without any change in connection strings. Log shipping doesn't have this transparency; clients need to connect to a server with a different name than the primary server.
You access another feature of log shipping through the restore options you use when applying subsequent transaction log backups. If you use the WITH STANDBY option for restore, you can make the secondary server available for read-only activities. This secondary server is an excellent place to move reporting and other read operations from the primary server and gives you a level of scalability that you can't get from a failover cluster.
Replication
Replication, the third high-availability technology in SQL Server, is complex but not complicated. Because of its complexity, people don't use it as frequently as log shipping or failover clustering for high availability. For purposes of high availability, I restrict this discussion to transactional replication in the case of one Publisherthe primary serversending data to one or more Subscribersthe secondary servers.
Failover clustering operates at the server level, ensuring that the entire SQL Server is available. Log shipping works at a more granular level, ensuring the availability of one database within a server. Replication moves to the most granular level: It ensures availability transaction by transaction and can even provide availability for only a subset of one table in a database. Whereas log shipping works by restoring transaction log backups from the primary server to a secondary server, replication works within the transaction log to send individual transactions to a secondary server as they occur. Because of this granular function, replication can provide extremely rapid failover from a primary to a secondary server. In most stable configurations, you can keep the data on the secondary server current to within 1 or 2 seconds of the primary.
Although it can accomplish extremely low latency, replication has no scheduling mechanism that requires a maximum latency between primary and secondary server. The replication engine replicates the data and applies it to the secondary server whenever it gets there. Although the replication engine will deliver the data eventually, the uncertain delivery time causes the most problems with replication implementations.
Fortunately, the delivery time is entirely within the control of developers, DBAs, and other IT professionals. You can't replicate a transaction until it's been committed, so developers who write large transactions create latency problems in replication on two fronts. First, because these long-running transactions aren't replicated to the secondary server until they're complete, the secondary (i.e., the replication target) falls significantly behind the primary. Secondly, when a transaction is committed, packaging up the transaction, transmitting it, and executing it on the secondary server takes a long time. Additionally, transactional replication guarantees that transactions will be committed on the secondary server in exactly the same order as on the primary. Because the replication engine serializes all committed transactions, all subsequent committed transactions will queue up behind a large transaction, waiting for their turn to execute.
This might seem like a severe limitation, and it has a significant impact on the replication engine, but it doesn't affect availability too badly. Replication implementations for high availability generally employ at least three servers. The Publisher is the primary server where the applications send all transactions. The Subscriber is the secondary server, which acts as a standby for the primary and is available if the primary fails. The third server is the Distributor, whose sole function is to guarantee delivery of transactions to the secondary server. As soon as transactions are copied from the Publisher to the Distributor, the Distributor delivers them to the Subscriber. This process lets an application immediately fail over to the secondary servereven though not all transactions might have been committed on the secondarybecause the Distributor guarantees their delivery. In general, how you manage this latency in data availability will dictate how rapidly you can fail over from the primary server to the secondary. Under ideal conditions, clustering can accomplish a failover in about 15 seconds, log shipping can accomplish a failover in about 1 to 2 minutes, but replication can give you instantaneous failover.
Like log shipping, replication can maintain as many secondary servers as you want to deploy. The only limitation comes from your processing capacity, network bandwidth, and ability to manage the infrastructure. As with log shipping, you can use a secondary server to offload read operations to provide additional scalability. The secondary server remains 100 percent available to read operations even while the Distributor is applying transactions. It remains available because unlike the transaction-log restores that log shipping relies on, which can't happen while another user is accessing the databasethe replication engine issues the same transactions on the secondary server as the application on the primary issued. Also like log shipping, replication doesn't have a single point of hardware failure because it's using a physically separate server, and it provides no automated failover. As a side effect of the replication engine essentially reprocessing transactions on the secondary server, the replication engine can't propagate data corruption from the primary server to the secondary. Note that, as with every other high-availability technology, replication won't protect you from user error. This is the one availability problem that you can't handle with technology.
Sounds like replication is a great choice for high availability, but it isn't that simple. Although you have physically separate hardware components, because replication binds multiple machines together, failure of one server can cause the other server to fail. This is called a soft failure, and it usually results from the transaction log filling the available disk space and taking the database offline. Issuing a transaction-log backup doesn't solve this problem, because although such a backup removes committed transactions from the log, it doesn't remove transactions on replicated tables from the log until they arrive at the next server upstream. Therefore, replicated transactions on the Publisher can't be removed from the transaction log until they've been successfully written to the Distributor. And you can't remove transactions from the Distributor until they've been successfully written to all Subscribers. If a Subscriber fails and remains offline for a long time, the data volume on the Distributor might grow until it exceeds available disk space and causes the Distributor to go offline. This in turn prevents transactions from being removed from the transaction log on the Publisher until they've been written to the Distributor. If the transaction log fills up and can't expand, the primary server goes offline. If you're implementing replication for high availability, you need to closely monitor it and fix failures in the environment rapidly to prevent this cascading-failure scenario.
Three Ways to Available Systems
You can use SQL Server 2000's three core technologies alone or in concert to produce highly available architectures. Failover clustering protects against hardware failures while providing an automated mechanism to restart SQL Server on functioning hardware. But a failover cluster has a single point of failure in the shared drive array, can't protect from data errors, doesn't provide any scalability features, and is extremely sensitive to the duration of transactions. Log shipping protects against hardware failures and can prevent some data failures; it also provides a rudimentary mechanism to offload reporting from the primary server to a secondary server and increase scalability on the system. But log shipping is also sensitive to transaction duration and has the drawback of maintaining a separate copy of the data with no easy way to fail back from a secondary server to the primary. Replication can protect against hardware failures and SQL Server data corruption while providing a high degree of scalability in any environment. But replication can't protect against user error, is sensitive to the size and duration of transactions, and is relatively complex.
The bottom line is that no perfect solution exists. Each of these options can increase your availability and provide more data protection, and you can use them together to produce even higher degrees of availability. Your challenge is to figure out how to combine them to minimize each technology's drawbacks.
End of Article
Prev. page
1
2
[3]
next page -->