NT workstations provide high-performance supercomputing

Periodically, you'll hear a die-hard anti-Windows NT type claim that NT isn't scalable. Apparently, the National Center for Supercomputing Applications (NCSA) hasn't heard this message: NCSA at the University of Illinois at Urbana-Champaign built a 192-processor cluster of commercially available Intel-based HP and Compaq workstations running unmodified NT Server 4.0 Service Pack 3 (SP3). This NT cluster will join other supercomputers to support supercomputing applications for NCSA and the National Science Foundation (NSF), which also funds the National Computational Science Alliance.

NCSA is the lead institution in the Alliance, which is composed of researchers from more than 50 institutions. The Alliance already has other supercomputers, such as Silicon Graphics' CRAY Origin2000. However, as Robert Pennington, team leader of NCSA's NT Cluster Group, explained, "We saw the coming importance of NT and Intel, and we needed to give users a choice." Although most scientific applications that solve complex mathematical and scientific problems are UNIX-based, some users have come to NCSA with NT-based codes that NCSA has needed to support. The purpose of the NCSA's NT cluster is to support existing UNIX applications from other supercomputers and to demonstrate that NT and Intel technology are viable for high-performance computing. Currently, about 100 researchers and programmers run applications dealing with astrophysics, environmental hydrology, and numerical relativity on the NCSA NT cluster, but Pennington said he expects this number to grow rapidly as the system joins other supercomputers in the Alliance. "We expected to see more emphasis on NT from technical and scientific users—and this is what is beginning to happen."

Building with Basics
The NCSA cluster's key feature is the commercially available and unmodified hardware and software components. The computers in the cluster—64 HP Kayak XU PC Workstations with 300MHz Intel dual-Pentium II processors and 32 Compaq Professional Workstation 6000s with 333MHz Intel dual-Pentium II processors—run NT Server 4.0 with SP3. As Figure 1 shows, these computers work together as one virtual server with 192 processors and 50GB of RAM. Myricom's Myrinet high-speed System Area Network (SAN) provides an 80MBps hardware connection between the computers in the cluster, and a 100Mbps Fast Ethernet connection links the cluster to the storage area.

The only component of the NT supercluster that isn't made of off-the-shelf products is the High Performance Virtual Machines (HPVM) set. Andrew Chien, Science Applications International Corporation chair professor in the Department of Computer Science and Engineering and leader of the Concurrent Systems Architecture Group at the University of California at San Diego, developed the HPVM set. Chien designed the HPVM set of software tools to unite groups of ordinary desktop computers into one high-performance environment.

Working Out the Bugs
Although most of the network components are off-the-shelf products, building the cluster to run the applications presented some problems. Commenting on obstacles to the project's success, Pennington said, "We had a sizeable list [of problems] that we worked with Dr. Andrew Chien to resolve." (For more of Pennington's comments on the cluster project, see "An Interview with Robert Pennington.") NCSA had to determine how applications designed to run on 4, 8, or 32 processors would scale to 192 processors. Native UNIX applications required migration to the NT platform. NCSA also needed to provide users access to the cluster, ensure adequate application storage space for users, and make sure the servers had adequate processor power for the demanding applications. Finally, NCSA had concerns about upgrading 96 computers in the future.

Running Applications on the NT Cluster
Preparing applications for the NT supercluster required a combination of tweaking and migration to make them run. As Pennington expected, some applications required redesigning to provide better scalability. Migrating the applications from UNIX to NT was a challenge because the system calls that UNIX uses don't always exist in NT, and UNIX sockets and NT sockets have some differences. To resolve these problems, NCSA applied Cygnus Solutions' Cygwin toolkit, composed of GNU development tools and utilities for 32-bit Windows, to port the UNIX applications to NT.

Applications that run on a supercluster are apt to be demanding—to put it mildly. To reduce competition for network and other resources, NCSA used two components of Platform Computing's Load Sharing Facility (LSF) Suite integrated with the HPVM. LSF's batch capabilities make the 96 processors in the cluster accessible to users. To reduce competition for bandwidth and processing cycles, LSF's scheduling capabilities assign one or two user processes per dual-processor box, thus ensuring that the applications have all the memory and processor time they need.

   Prev. page   [1] 2     next page
 
 

ADS BY GOOGLE