With this component redundancy, the servers can run applications in lockstep synchronization and execute parallel instructions on each pair of components. When the NEC and Stratus servers detect a fault, processing continues uninterrupted on the opposite subsystem (instead of the opposite pair of compute engine and IOP systems) and full redundancy for remaining subsystems is maintained. At that point, the system notifies the systems administrator who is managing the server, who can, if necessary, instruct an untrained employee to swap out the module with a new one without having to power down the server. Reintegration of the failed part should suspend application processing for no more than 12 seconds. When the Lab tested the Stratus version of this product, the intentional hardware failures and reintegrations we performed didn't result in any noticeable disruption of application processing.
Another interesting aspect of the NEC and Stratus products is their use of hardened device drivers for Gibabit Ethernet, SCSI, and Fibre Channel adapters; the hardened drivers let a failed adapter seamlessly fail over to its counterpart. In addition, these drivers isolate I/O exceptions from the OS's kernel, providing a stable OS environment that should reduce the number of software failures.
Since the introduction of their dual-processor servers, NEC and Stratus have also introduced new 4-way fault-tolerant servers based on a common hardware design. NEC's Express5800/ft 340Ha and Stratus's ftServer 6500 ship with 2.0GHz Xeon MP processors and 2GB of L3 cache in a 16U (28") rack-mount cabinet. The new design is also rated for five-nines hardware uptime using dual redundancy of components. For clients whose applications demand even greater uptime, both vendors offer a version of the server with three processor modules in an 18U (31.5") rack-mount cabinet for triple redundancy of that subsystem.
The four-processor server's standard I/O subsystem, with its two SCSI controllers, is similar to the two-processor model's subsystem. Like the 2-way server, this model also relies on VERITAS Volume Manager to mirror the data over two drive sets. The new server's larger storage module holds as many as 14 drives instead of just 6. For buyers who prefer a hardware RAID solution, both companies offer an additional Fibre Channel RAID storage module with dual RAID controllers and as many as 14 Fibre Channel drives.
Stratus also offers the ftServer 6500 in a two-processor configuration called the ftServer 5240, and NEC has introduced the Express5800/ft 320Lb, a dual-processor fault-tolerant system in a high-density 4U (7") rack-mount form factor. Pricing wasn't determined at press time. The Express5800/ft 320Lb, which uses NEC's core logic, has 2.4GHz Xeon DP processors with 512KB of L2 cache and is available with triple redundancy of the processor module in a 7U (12.25") rack-mount chassis. This product uses the same storage strategy as NEC's existing two-processor model with as many as six internal hard disks but merges the I/O and storage modules to save space.
Which Solution Is Right for You?
In our tests of the earlier Marathon and Stratus two-processor fault-tolerant systems, we were impressed with the way the products handled hardware faults and reintegration processes with almost no impact on applications. Configuration and setup were extremely easy with the Stratus product. We would expect the newer NEC and Stratus models to perform similarly and be just as easy to configure because they use nearly identical designs and the same technology.
But, as you might expect, component replication makes most fault-tolerant systems expensive. For example, NEC's and Stratus's two-processor 800MHz Pentium III systems (including two redundant processors), 2GB of RAM, and six 36GB drives cost almost $30,000. Stratus's new 4-way system in a typical configuration that includes a hardware RAID module and 14 drives costs about $160,000. Pricing for NEC's 4-way system in an identical configuration should be similar (pricing hadn't been announced at press time). Even if you delete the hardware RAID module and connect these 4-way systems to a Storage Area Network (SAN), prices hover around $125,000.
If you already have four servers (two identical pairs), the Endurance 6200 4.1 hardware and software kit, which sells for about $20,000, might be a cost-effective way to achieve fault tolerance. But the most attractive option might be Marathon's Endurance software offering, which requires just two identically equipped servers and could be an exceptionally attractive solution if the pricing is as low Marathon suggests it will be.
Comparing the manufacturers' uptime claims for these products to clustering solutions is inordinately difficult because a lot depends on the reliability of your applications. Obviously, if your applications demand the performance of more than four processors, clustering is your best alternative.
End of Article
Prev. page
1
[2]
next page -->