Active/standby and active/active apply to the system level, but they can
also apply to applications. For example, both nodes in an MSCS solution can
actively run and offer services; this capability makes MSCS an active/active
system-level implementation. At the application level, MSCS supports both
active/active and active/standby solutions. For example, SQL Server 6.5
Enterprise Edition and Internet Information Server (IIS) support active/active
configurations on MSCS, whereas Exchange 5.5, Enterprise Edition runs only in an
active/standby configuration.
A problem arises when you apply any definition of clustering to current NT
availability solutions: Most of these solutions address only the
increased-availability portion of the clustering triad, the other two elements
of which are manageability and scalability. Mark Smith pointed out this
shortcoming in "Clusters for Everyone," June 1997, and it's still true
a year later. Few increased-availability products for NT offer continued
(automated failover and back) access to resources without operator intervention
or, worse, system restart. Even fewer products have addressed the manageability
and scalability legs of the triad that mini and mainframe clusters have targeted
for over a decade. In fact, the real advances in multinode scalability have been
limited to database and Internet-related solutions. Does that shortcoming mean
increased-availability solutions are bad? Certainly not. However, this situation
means you'll probably have to take advantage of each product's strength,
work around its limitations, and use a combination of products to meet your NT
availability needs.
Availability Classifications
The keys to selecting and implementing the right high-availability solution
are identifying applications that need increased availability, defining the
outage duration and type your business can tolerate, and determining how much
your business is willing to spend for the redundancy necessary to meet your
expectations. Vendors such as Digital Equipment, HP, and NCR place the
single-host availability of NT Server running on Pentium Pro systems at the 99
percent uptime level. For systems that must operate 24 hours a day, 7 days a
week year-round, 99 percent availability translates to about 87 hours of planned
and unplanned downtime per year. Adding RAID data protection to such a system
lets it survive some level of disk failure and raises availability to 99.5
percent, or 44 hours of downtime per year.
Fifty-two planned outages lasting 50 minutes each (44 hours distributed
over 52 weeks) is manageable for most sites. For other sites, though, even a few
minutes of planned downtime, let alone the threat of outages lasting for days,
justify moving beyond the usual commercial availability of a single NT system.
These sites are where high-availability (data mirroring with failover) and
fault-resilient clustering (data-sharing solutions such as MSCS) products that
take NT systems to 99.9 percent (8.8 hours of downtime per year) and 99.99
percent (53 minutes of downtime per year) availability come into play.
High-availability and clustering solutions provide system redundancy and
support some level of application restart or resource failover among member
systems. These features increase system availability by facilitating the
transfer of resource responsibilities to surviving systems. Although the
resources remain highly available, the transfer, or failover, takes time, from
seconds (for a few file shares) to minutes (5 minutes to 10 minutes for the
restart of an application such as Exchange). Some client/server applications, by
fluke or design, can survive these momentary transitions. Other applications
cannot tolerate any identifiable transfer time. For a more detailed view of
system availability, see Chapter 3 of Transaction Processing: Concepts and
Techniques, by Jim Gray and Andreas Reuter (Morgan Kaufman Publishers,
1992).
Prev. page
1
[2]
3
4
next page