Bundled solutions for NT clustering
Amdahl is one of many server companies that offer their
systems bundled with someone else's cluster software. When you purchase Amdahl hardware, you can
choose from VERITAS FirstWatch, NCR LifeKeeper, and soon Microsoft Wolfpack software--all on Windows
NT. With these choices, if you already have cluster software running on your network but on a
different operating system (such as FirstWatch or LifeKeeper on UNIX), you can choose the same setup
for NT and not have to learn and integrate a whole new system. If you're looking for fault tolerance
without sacrificing performance, easy administration, and high availability, consider the Amdahl and
LifeKeeper solution, which we reviewed in the Windows NT Magazine Lab. For now, let's forget
about price and focus only on what Amdahl's enterprise cluster solution will do for you.
Technology Overview
You have to look at this solution in two parts--the hardware and the software. As with Microsoft
and Wolfpack, NCR will support LifeKeeper--and allow it to be sold--only on certified hardware, such
as NCR's WorldMark server or Amdahl's EnVista Frontline Server (FS). You can't buy just the
LifeKeeper software and set it up on any system.
The hardware can be simple or complex, depending on the performance you want and the money you
can spend. Amdahl set up the Lab with a serious contingent of machinery: two quad Pentium Pro
EnVista FSs, each with 512MB of RAM, and an LVS 4500 Ultra Wide SCSI-3 disk array with twenty 4.3GB
drives. Figure 1 shows the configuration and interconnects. Either server alone can support 1000 or
more users; together, with proper load balancing, the servers can support twice that number.
Each component in the cluster solution is fault tolerant on multiple levels: The servers have
dual power supplies (available with three modules), dual SCSI controllers, Error-Correcting Code
(ECC) memory, and hot-swap drive bays. The disk array has hot-swap drive bays, five power supplies
with battery backup, ECC cache memory, and dual RAID controllers (availability managers). With such
redundancy, Amdahl has eliminated many--but not all--points of failure: Disk drives, power supplies,
and disk controllers can still fail. One qualification is that if one of the SCSI controllers fails,
you have to manually switch control of its drives to the other RAID controller in the LVS array. If
you don't, the SCSI controller failure initiates a server failover. This setup, however, provides
better disk I/O throughput because you use more than one SCSI card. You will have to employ
other means (special drivers and software, such as Adaptec Duralink) to use multiple network
controllers for redundancy and load balancing.
"Clustering Solutions Feature Summary," page 58, shows that LifeKeeper does almost
everything you might want. The fault-tolerance features of the EnVista servers and LVS 4500 disk
array strengthen the system so that you would have to inflict heavy damage on both systems to make
them go down. I'll point out the few exceptions in this review. LifeKeeper is fully compliant with
the Wolfpack APIs, so upgrading or interoperating with Wolfpack in the future won't be a problem.
How It Works
LifeKeeper uses hierarchies of resources to define cluster groups; an entire hierarchy is what
fails over from one system to the other. For example, a hierarchy can include a LAN Manager resource
name, a disk volume, and an IP address. A Microsoft SQL Server failover hierarchy might include the
disk volume that the data resides on, a named-pipe NetBIOS name (the SQL Server alias that appears
on the network, such as accounting), the specific database, and an IP address. Each
hierarchy has dependencies of objects; for example, the disk volume must come online before the
database can. The group can contain as many objects as needed to protect a given resource. You could
have, say, 10 disk volumes, a SQL Server object, Exchange Server, LAN Manager names, and IP
addresses all fail over at once under the accounting hierarchy.
Several heartbeats travel at once between the two nodes, so that any single failure--such as
someone tripping over a LAN cable--won't trigger an unexpected failover. By default, you have
interconnects via a direct network crossover, a LAN connection, and a serial link. All these
interconnects must fail before LifeKeeper shifts services from the primary to the secondary node.
You can also dedicate a small (1MB) partition on one of the shared array drives, and the nodes can
communicate through it. Each heartbeat runs at a different priority level, and you can configure the
polling frequency to control how long the secondary node waits before assuming control over shared
resources. But the polling frequency also affects system performance, because these heartbeats are
interrupt-driven and consume processing time.
Prev. page  
[1]
2
3
next page