SideBar    Measuring High Availability

RAID 5 technology introduces additional fault tolerance by allocating portions of each disk in the array to parity data. This setup enables realtime reconstruction of corrupted data if one disk in the array fails. The parity data reduces the amount of usable space in the array by the equivalent of one disk out of the entire array. As with RAID 1, you can remove and replace the failed disk without turning off the computer, and you experience no downtime—but you do experience some performance degradation during an outage. In a RAID 5 array, only one disk in the entire array isn't usable from the client's perspective, so RAID 5 is much less costly than RAID 1. The configuration in Figure 3 uses RAID 5 for the file data. In the event of one disk's failure, the client probably wouldn't notice degradation in the disk array's performance. In our example, if you used four 72GB disks, you would have three 72GB disks' worth of usable storage, or 216GB for user data.

The configuration in Figure 3 is typical of data-center systems. My company used this template last year for most systems, and those systems experienced no downtime as a result of physical disk failures, despite the 66 disks that we needed to replace.

Table 2 summarizes the cost of this redundancy for a ProLiant DL380 system that has 200GB of user data and 72GB disks for the data. The cost of the RAID systems is particularly high because the ProLiant DL380 can't house 8 to 10 drives without an external chassis, which adds considerably to the price. Drive redundancy doesn't protect against data corruption that results from software problems, and you might still need to restore from tape for a variety of reasons. Data redundancy does, however, protect you from the need to restore from tape because of a disk failure. You need to assess your SLA to determine whether you can justify the cost.

Depending on your environment and hardware, your next weakest link might either be a network device or the servers. To create a redundant server environment suitable for a high-availability file server, you can implement a simple server cluster in Win2K Advanced Server.

Figure 4 shows a cluster that includes RAID technology for the disks. You can configure a server cluster in many ways, but the basic concept is that if one server fails, another server takes over the failed server's functions. In the case of a file server, if a failover occurs from one system to another, users can continue working on a document stored on a shared disk array, possibly noticing a short delay while their applications reconnect to the cluster. Meanwhile, you can then take the failed server offline and repair it without affecting the users' operations or your SLA. When you finish repairing the server, you can rejoin it to the cluster and regain server redundancy. (Some applications aren't cluster aware, so be sure to check the cluster documentation carefully before you deploy a cluster solution.)

Table 3 summarizes the hardware cost of server redundancy for two similarly configured ProLiant DL380 servers with 200GB of file data in a shared external drive chassis. These numbers are approximations. The recommended Compaq solution replaces the shared SCSI channel with a fibre channel configuration, but I kept the SCSI channel to keep prices down. Also, cluster support can involve additional software and operational costs—for example, whereas you can use Win2K Server to install one file server, a cluster requires Win2K AS.

Server redundancy won't reduce the time necessary to restore data (as the partitioning options do) and won't create redundant copies of the data (as the RAID options do). This option only increases the availability of the server that publishes the disk data to the user. If your weakest link isn't the server, you might not need server clustering. For example, in our data center, less than 1 percent of our clients felt that the cost of clustering file servers merited the additional reliability.

Blueprint for High-Availability Web Servers
In the media, you can find many statistics about the availability of Web servers. Every time a major corporation or government entity has a Web site problem, the news makes headlines. An interesting source for availability numbers is Keynote Systems, which publishes the Keynote Government 40 Index and the Keynote Business 40 Index. The October 29, 2001, index showed the Federal Bureau of Investigation (FBI), Library of Congress, and Supreme Court Web sites ran at 99.24 percent, 99.96 percent, and 99.62 percent availability, respectively. Similarly, during the Christmas holiday season, the average availability of the top 10 shopping Web sites (e.g., Nordstrom, Neiman Marcus, Saks Fifth Avenue) was 98.5 percent. How do you achieve such levels of availability for a Web server?

Your high-availability options for a Microsoft IIS Web server aren't terribly different from those for a file server. You can configure the system to reduce the time necessary to restore service and data after an outage, and you can reduce the frequency of outages. In addition to the techniques you use for file servers, two features of Win2K AS and IIS are available: Virtual Directories for data partitioning and Network Load Balancing (NLB) for mirrored servers.

Prev. page     1 2 [3] 4 5     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

Page 24 of the print article states that RAID 5 technology introduces additional fault tolerance by allocating portions of each disk in the array to parity data.

No, RAID 5 does not provides additional fault tolerance over mirroring. It is just another way of providing fault tolerance in which we have a more efficient fault tolerance (because mirroring means 50 % efficiency where as teh efficiency of RAID 5 exceeds 66%). It is efficient but it does not introduce any more fault tolerance.

Murat Yildirimoglu