Shutting Down the NF9008XP
Maestro monitors temperature sensors and tracks the rotational speed of the NF9008XP's numerous fans (two for the hard disk bays, two for the expansion slots, four for the processors, and one for each power supply) to keep temperature levels throughout the unit within acceptable parameters. When a power supply fan fails, Maestro disables that power supply. When another fan fails or the system overheats, Maestro increases the rotational speed of other fans to compensate. If increasing fan speed doesn't solve the problem, Maestro powers down the machine before excessive heat causes permanent damage.
When the Windows NT Magazine Lab received the test unit, a NetFRAME representative demonstrated Maestro's reaction to fan failure. He stuck a small straw into a fan blade and stopped the fan's motor. Taking this demonstration as an endorsement of such abuse, I grabbed a plastic fork and started stopping fans all over the machine.
Maestro has a screen that shows when a particular fan stops functioning properly and other fans speed up to compensate. But I had to look for this information. I expected a warning about the failure to pop up in Maestro, but no such warning appeared.
Fortunately, a small LCD screen on the front of the unit notes fan failures, and the NF9008XP's Simple Network Management Protocol (SNMP) notification feature can send notification messages when the system has a problem. You can specify which functions, failures, or operations will send an SNMP alert. NetFRAME is developing Desktop Management Interface (DMI) support for lower-level hardware failures.
Armed with my fork, I momentarily stopped the fan on each power supply. As each fan stopped, Maestro showed that its power supply was offline. The fan resumed spinning, but Maestro didn't indicate that the power supply came back on. Stabbing the third power supply's fan caused a total system failure. I removed and reinserted all three power supplies and rebooted the machine, but I could not get it back online.
I continued my testing on a second NF9008XP. This time, I created a 45GB RAID 5 stripe set using the NT system tools. The machine configured and formatted the hard disks in roughly 4 hours. After killing one NF9008XP, I chose not to experiment with this unit's power supply fans. Still, I found that powering down the unit and trying to restart it produced the same results as my fork experiments: no power to the PCI bus.
In lieu of running away and hiding, I did everything I could think of to get the second machine back online, with no results at first. Two days later, I again tried to bring the NF9008XP back to life. This time I pushed really hard on a black button on the back of the machine. The button, identified with only a circle and a horizontal line inside it, did the trick. The PCI bus blinked back to life, the system found its hard disks, and the NF9008XP booted up.
Testing the NF9008XP
I finally tested file and print services performance by comparing the NF9008XP with a brand-name control server that has performed well in recent Lab tests. The control server has four 200MHz Pentium Pro processors, 512MB of RAM, four 10/100Mbps Fast Ethernet PCI network cards, and four 2GB SCSI hard disks. For my tests, both servers were running Windows NT Server 4.0 with Service Pack 3 (SP3).
To test the two systems, I ran Bluecurve's Dynameasure/File Services 1.5's Copy All Bidirectional tests. (For information about Dynameasure, see Carlos Bernal, "Dynameasure Enterprise 1.5," September 1997.) These tests process, in random order, 16 transactions that copy compressed data, uncompressed data, binary files, text files, and image files between the server and clients. The test files range in size from 500KB to 5MB. The test specifications called for six steps, starting with 10 motors (simulated users) at step 1 and increasing that number at each step to 100 motors at step 6. I ran the tests on the Lab's standard configuration: a set of client machines on a 100Mbps Ethernet network simulating the workload of multiple users. (For more information about the Lab's benchmarking network, see "The Lab's Test Environment," page 96.)
Graph 1 shows the two systems' throughput at each step. Throughput measures system capacity in terms of the number of bytes that all the motors copy during the measurement phase, divided by the elapsed time of the measurement phase. The NF9008XP reached its maximum throughput of 4.43MB per second (MBps) in step 2. The control server reached its maximum throughput of 4.28MBps in step 3.
Graph 2 shows the systems' average response times, which measure the average speed at which each system reads a file and copies it to another disk. The NF9008XP was faster than the control server for low numbers of motors. At step 2, with 20 motors, the NF9008XP had an average response time of 2.52 seconds, and the control server had an average response time of 7.78 seconds. After step 2, the NF9008XP's performance degraded rapidly. At step 3, with 39 motors, the control server had an average response time of 9.84 seconds, and the NF9008XP had an average response time of 17.13 seconds. After step 3, both systems' performance degraded, but the NF9008XP's performance degraded much more quickly than the control server's performance.
Returning the NF9008XP
Back at NetFRAME's laboratories, the first machine came back to life. A NetFRAME engineer told me that nothing was wrong with it. I have a feeling that the problem was related to the unmarked black button.
I obviously found some less-than-desirable traits of the NF9008XP. But, systems administrators aren't likely to poke their servers with eating utensils. If you need access to mission-critical information in seconds, 24 hours a day, 7 days a week, and if no more than 30 users will generate transactions simultaneously, then the NF9008XP might be a good deal for you. If you require access for more than 30 users at a time, keep shopping.
End of Article
Prev. page
1
[2]
next page -->