See correction to this article

Vyas admits MSCS has a problem with duplicate shared names. Two shares can't have the same name after failover. If you have the same share names on each node, the failing node share will disappear. In addition, print queues must have unique names on each node, even though they might point to the same printer. First Union has notified Microsoft of this shortcoming, but was still waiting for a solution at press time.

Vyas said that future plans include clustering SQL Server. First Union's database of choice is Sybase on UNIX; however, the company is developing many new applications on SQL Server.

IBM World Registry Division, Washington, D.C.
Industry: Top secret software development
Cluster use: High availability software development environment
Solutions: Computer Associates ARCserve Replication for Windows NT

IBM World Registry Division
Imagine developing applications that are so top secret that you can't back them up on tape. This scenario became reality for Mark Shoger of Keane Federal Systems. IBM World Registry Division (WRD) hired Keane to help with the company's development efforts. Keane said WRD needed a realtime backup system to handle open files and systems policies. WRD couldn't have any removable media, because it would void the company's top-secret classification requirements. Finally, WRD needed 99.99 percent availability and no data loss. To meet these requirements, the company turned to Computer Associates' ARCserve Replication realtime backup and recovery system.

WRD has 500 users attached to four large NT servers that handle development, and the company runs Lotus Notes Domino for group communications. Each of the four primary servers connects to a backup server that mirrors the data on the other four servers. Figure 3, page 128, shows the WRD cluster network model.

ARCserve Replication runs on each server and monitors threshold levels such as hard disk space and network performance. The primary servers are dual Pentium II systems with 512MB of RAM and 4GB of storage (16GB total), and a backup server with 20GB of storage.

The payoff for using this NT-based solution is simple. If a problem occurs, such as a hard disk crash, ARCserve Replication detects the problem and switches users to the backup server within a few seconds. The users are unaware that any change has taken place. Shoger can replace the failed hard disk at his leisure. He then initiates the failback procedure, which synchronizes the new disk and reroutes the users to the primary server. "I've been doing network administration for a long time and this failure and recovery process impresses me," said Shoger. "One time, a NIC failed and the system ran the whole weekend on the backup server before I noticed it," said Shoger. He also points out that the ARCserve Replication software was easy to install and maintain.

Shoger recommends this system for large networks. It requires an extra system for backup and recovery which may be prohibitive for small networks. If you need this kind of protection, Shoger recommends using a backup system that has 25 percent more power than any of the systems it's protecting.

Looking ahead, WRD might implement an additional backup server at a remote site for disaster recov- ery. Such a configuration would help keep the company up and running, even if the primary data center blew up.

John C. Lincoln Hospital, Arizona
Industry: Healthcare
Cluster use: Increase availability of PeopleSoft (Oracle on NT), Office 97, and medical application (Cerner) for a level 1 trauma center
Solutions: Nine two-node clusters, Vinca StandbyServer for Windows NT, Compaq ProLiant servers

John C. Lincoln Hospital
Downtime is not an option for the John C. Lincoln Hospital level 1 trauma center in Arizona. To maintain its level 1 status, the hospital must be able to respond to a life-threatening emergency at all times. Vinca StandbyServer for NT software keeps the trauma center's 7000-user NT environment continuously running.

So how does this solution work? Imagine that the primary server fails and displays a blue screen. Within 30 seconds, the first set of users are working on the standby server. However, because not all applications fail over gracefully to the standby server, some users experience a GPF and have to reboot. After they reboot they automatically connect to the standby server and are up and running. To restore the primary server, you simply break the mirror, reboot the primary server, re-establish the mirror, and reboot the primary server again.

The hospital chose Vinca because of its low overhead on the primary server. "Vinca runs clean and light, and you hardly know it's there," said Mark Jablonski, former network administrator for the hospital. "Overall, Vinca is a sleep saver. If users are working at night, you can keep sleeping," said Jablonski. "Anything that keeps my beeper from going off is a friend of mine."

Jablonski recommends researching the resource overhead before you buy. "If the Primary Domain Controller goes over 50 percent CPU utilization, it's hard to log on to the PDC," he said. He recommends checking the cluster solution to ensure the CPU utilization doesn't go through the roof. You will also want to check your clustering solution against your night load when you run backups, virus scanners, and other administrative applications. "It's at night when you get beeped," said Jablonski.

Besides researching the resource overhead, you need to check the reliability of the clustering solution: Does it fail over five out of five times? "Vinca is successful 90 percent of the time. Sometimes you must restart services manually on failover," said Jablonski. Also, look for quality support. Vinca's support is helpful and knowledgeable. For 24 * 7 support, you can purchase Vinca's 24 * 7 premium support.

The hospital plans to implement an active/active cluster. With the current active/standby configuration, nine of the nodes aren't active. Jablonski said the hospital is investigating active/active solutions.

Surplus Direct, Oregon
Industry: Online auction and discount computer store
Cluster use: Electronic commerce
Solutions: Resonate Central Dispatch, which includes Dispatch Manager; Microsoft Visual SourceSafe; Microsoft Internet Information Server 3.0; Microsoft Cluster Server; Microsoft SQL Server 6.5, Enterprise Edition; Tandem CS150; LANWARE NTManage; Westwind Technology Webconnect

Surplus Direct
Do you want a great deal on hardware and software? That's the promise of Surplus Direct, which acts as a clearinghouse for publishers, distributors, and retailers of overstocked, factory refurbished, or distressed inventories. Surplus Direct sells or auctions these items over the Web. To provide its customers with the best service, the company needed a solution that could run 24 * 7 and scale easily. In addition, the company dynamically generates about 90 percent to 95 percent of its Web pages from a SQL Server database. These requirements led Surplus Direct to use a combination of NT products for its clustering solution. Figure 4 shows the Surplus Direct cluster network model.

Surplus Direct uses Resonate's Dispatch Manager software, which is part of Resonate's Central Dispatch product, to monitor incoming Web traffic by open connections, CPU load, and network latency. The software balances the incoming Web traffic with its clustered schedulers. The schedulers assign the workload to one of six front-end IIS-based Web servers. These schedulers get a workout right before auction closing at 11 a.m. each day when Web traffic spikes considerably. The Web servers request data from SQL Server systems running on Tandem CS150 clustered hardware. Surplus Direct uses MSCS to provide the clustering software.

Surplus Direct likes using Dispatch Manager to produce bar charts and tables to graphically monitor the pool of schedulers and Web servers. In addition, the company uses LANWARE's NTManage to graphically monitor the network traffic running through its routers, hubs, and switches.

Surplus Direct also takes advantage of the system's scalability. The company can easily add front-end servers, which increase the throughput of the Web sites. Surplus Direct uses Visual SourceSafe to handle source-code version control and replicate pages and changes among all Web server nodes. The company uses Westwind Technology's Webconnect to access SQL Server from the Web servers. "I've been able to sleep better at night because there's no single point of failure," said administrator Mark Daley.

Surplus Direct looked for hardware solutions during evaluation, but couldn't find anything that supported virtual IP addresses. The company needed each user session to stay in the assigned pool. Surplus Direct also needed a software solution that let it use the fastest machines available and add as many Web servers as necessary. "Be patient in finding the right solution--stay very objective," advised Daley.

Tulip Computers, Netherlands
Industry: Computer manufacturing
Cluster use: SAP materials management, production planning, sales, and distribution
Solutions: NCR's LifeKeeper two-node cluster, SAP, Oracle 7 on Windows NT

Tulip Computers
Is NT ready for the enterprise? Tulip Computers thinks so. The company produces and develops PCs for the European and Asian business-to-business market. In addition, Tulip recently purchased Commodore, which makes computers for the European consumer market. Tulip currently has 700 employees and revenue of $300 million.

Tulip uses a two-node LifeKeeper active/standby cluster to run an Oracle 7 on NT database on NCR 4300 4 * 200 hardware with 1.5GB of RAM and 80GB of hard disk space. Five different application servers running SAP's materials management, production planning, sales and distribution, warehouse, and financial controlling modules access the Oracle on NT database. Each application server runs on an NCR 4-way SMP server. Although Tulip hasn't clustered these applications, the company plans to put the SAP application servers into LifeKeeper clusters to maximize availability. Figure 5 shows Tulip's cluster network model.

John Hoogendoorn, Tulip's IS manager, recommends running a database application on an active/standby cluster configuration instead of an active/ active configuration. He believes you can more easily recover and manage this environment. Hoogendoorn also recommends finding a cluster-aware backup solution. Tulip's current backup solution isn't cluster aware, so Hoogendoorn bought a separate backup system for the standby node of the cluster.

"The NCR hardware has worked so well that we haven't seen it fail over in production," said Hoogendoorn. "But we've tried to manually failover and that worked." Hoogendoorn said the system completely failed over in 5 minutes. LifeKeeper has developed various recovery kits (or scripts) to handle specific recovery needs for applications such as SAP and Oracle. For a complete list of recovery kits, visit LifeKeeper's Web site.

[Editor's note: This article assumes familiarity with clustering. For an overview of clustering for Windows NT and a list of clustering-related terms and technologies, see Joel Sloss, "Clustering Solutions for Windows NT," June 1997.]

Contact Info
ARCserve Replication for Windows NT
Computer Associates * 800-243-9462
Web: http://www.cheyenne.com/ storage

Citrix WinFrame
Citrix Systems * 954-267-3000
Web: http://www.citrix.com

Convoy Cluster Software
Valence Research * 503-531-8718
Web: http://www.valence.com

Digital Clusters for Windows NT
Digital Equipment * 800-344-4825
Web: http://www.digital.com
Microsoft Cluster Server
Microsoft * 425-882-8080
Web: http://www.microsoft.com/ ntserverenterprise

Endurance 4000
Marathon Technologies * 978-266-9999 or 800-884-6425
Web: http://www.marathon technologies.com

LifeKeeper
NCR * 937-445-5000
Web: http://www.ncr.com/ product/nt
Octopus DataStar
Qualix Group * 650-572-0200 or 800-245-8649
Web: http://octopustech.com

RemoteServ/IS
Cubix * 702-888-1000 or 800-829-0550
Web: http://www.cubix.com

StandbyServer for Windows NT
Vinca * 801-223-3100 or 888-808-4266
Web: http://www.vinca.com

End of Article

Prev. page     1 [2]     next page -->
CORRECTIONS TO THIS ARTICLE:
"NT Clustering Solutions Are Here" incorrectly identified Sid Vyas as the CIO of First Union Capital Markets Group. His correct title is vice-president (non-UNIX servers). Wayne Ginion heads Capital Markets' technology division.




You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I read “NT Clustering Solutions Are Here” (June) and wanted to provide some additional information. The article is outstanding and I thoroughly enjoyed reading it. The additional information regards Vinca’s StandbyServer software. In April, Vinca unveiled its next level of clustering software, Co-StandbyServer for NT. It basically works the same as StandbyServer with one important difference: Instead of establishing a primary server and standby server configuration (the standby server is an unused resource), the new package uses both servers and balances the loads between the two machines. In small companies with small budgets, this feature can be important because it’s hard to financially justify unused hardware resources. In the new product, Vinca eliminated both the need for a specific NIC (Intel 10/100 Pro) and the requirement for identical hardware configurations on the servers. I’ve been actively testing the Co-StandbyServer package since Vinca released the evaluation copy. The only hardware-related issue I’ve come across relates to RAM. Anyone who uses the product needs to double the amount of RAM in each server, so that when failover occurs, a server has enough room for all the transferred services and user sessions. Co-StandbyServer is an outstanding product and I think more people need to know that it exists.<br> --Greg Provo<br><br>

<i>Thanks for your letter. I was aware of the new version of Vinca’s product, but the customer I interviewed was not. That’s why I didn’t include the information in the June article. The Windows NT Magazine Lab tested Co-StandbyServer 1.02 for NT. “Clustering Software for Your Network” (July) contains the Lab’s review.<br> --Mark Smith</i>

Greg Provo

The subtitle of “NT Clustering Solutions Are Here” (June) reads, “8 real-world examples that satisfy availability and scalability needs.” The article actually presented only seven examples. The Celanese solution running Gensym G2 and Microsoft SQL Server on Marathon’s Endurance 4000 is not a cluster. The configuration diagram illustrates one logical server. Unlike clusters, Marathon’s fault-tolerant solution provides continuous application service that can ride out component failure and repair, without degradation of performance, code modifications or additions, or human intervention. The platform runs standard, unadulterated NT and application software. The optional fiber-optic link that connects the two halves of the server provides a degree of disaster tolerance. In a recent fire at Celanese, the IS UNIX systems shut down, but G2 on Marathon kept running without missing a beat.<br> --Craig Jon Anderson

Craig Jon Anderson

 
 

ADS BY GOOGLE