Clustering isn't the only way to increase NT's availability

Maybe it's because I live in Colorado, but when I hear vendors talk about their Windows NT clustering and high-availability products, I think of the medicine shows that once roamed the West. Remember the ones from the classic Westerns? The shows traveled from town to town, selling snake oil and promising it would cure everyone's ills. But what really happens when you sip the magic elixir--or attempt to implement a high-availability solution in your environment?

Clustering isn't a cure-all for your NT availability woes. No amount of faith or sweat will make an incorrect solution solve a problem it wasn't designed to solve. Don't misunderstand me--I've implemented both hardware and software solutions to increase NT's availability, and those solutions work. However, the key to succeeding in your quest for continuous NT computing lies in understanding your needs and each solution's capabilities, and in prototyping to make sure a solution will scale to solve your business problem.

Software and hardware solutions that increase NT's availability are part of, not replacements for, good systems-management practices. These solutions increase system availability and reduce your business' exposure to computer downtime by providing redundancy to your computing environment in the same way RAID or multiple power supplies in a system provide redundancy. NT clusters and related solu-tions only let you recover from or mask failures that cause system outages--you must continue to execute proper backup and disaster-recovery strategies.

In this article, I'll give you my refined definitions of clustering terminology and review availability classifications that help categorize vendor solutions. Then, I'll identify features that can help you select and implement a data mirroring and failover solution that meets your business requirements. Along the way, I'll point out some differences between data mirroring and failover solutions and Microsoft Cluster Server (MSCS).

Defining Clustering

No amount of faith or sweat will make an incorrect solution solve a problem it wasn't designed to solve.
Although many vendors offer NT clustering products, trade publications and product data sheets sometimes apply the word cluster or clustering like a branding iron to products with varying feature sets. These products actually span a continuum, from high-availability solutions to fault-tolerance products. To categorize these offerings, I propose this definition of clustering: A cluster is a group of servers that independently execute their operating system (OS) and applications to let clients access resources that are available to all the servers in the group. If a cluster member experiences system failure, resource access is available through another cluster member and does not require operator intervention or system restart. Collectively, clustered systems provide higher availability, increased manageability, and greater scalability than each system can provide independently.

My definition isn't much different from the definitions I've heard in my discussions with Microsoft staff. Mark Wood, Windows NT Server, Enterprise Edition's first product manager, defines a cluster as a group of independent systems linked together for higher availability, easier manageability, and greater scalability. Jim Gray, senior researcher in Microsoft's Scaleable Servers Research Group and manager of Microsoft's Bay Area Research Center, describes a cluster as "a collection of independent computers that is as easy to use as a single computer." He further describes clusters as solutions that not only provide failover capabilities but also disperse data and computation among a cluster's members. (To learn more about Jim Gray's clustering vision, see his sidebar "Commodity Cluster Requirements," June 1998.)

My definition of clustering narrows the focus to exclude from the clustering category data-mirroring-with-failover solutions, which do not provide access to common storage resources or support the automatic reentry of a recovered system into a cluster. Access to common storage and the seamless addition and removal of systems in a cluster is key to the distinction I make between NT clustering solutions such as MSCS--which provide seamless interaction between systems--and data mirroring with high availability products--which do not provide seamless interaction. This distinction might appear minor today, but it will become increasingly important as Microsoft expands MSCS beyond its current two-node support.

While I'm defining terms, let's look at the terms active/standby and active/active. These terms apply at the system level to refer to the nodes in a cluster that actively perform work or wait in standby mode to assume the load of a failed cluster node. Active/standby means one node is working and the other is waiting. In active/active implementations, both nodes actively perform independent work.

   Prev. page   [1] 2 3 4     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.