Ensure device driver reliability

Last month, I wrote about a company that switched from Windows NT to Linux because of NT Server reliability problems. To provide some balance, this month I address a major cause of reliability problems—faulty third-party device drivers.

Early users of NT constantly had to check the NT Hardware Compatibility List (HCL) to ensure that NT supported their hardware devices. They had to assume that a device would not work with NT. Today, NT is a commodity OS, and people assume that all hardware works with NT. If it doesn't, the fault is probably not NT's. In response to my May column, Mark Russinovich, author of our NT Internals column, wrote me:

NT supports thousands of hardware devices. Not only does Microsoft not write the drivers for NT hardware, but the developers writing drivers for NT often have no experience with NT drivers or internals. In addition, hardware companies are on accelerated Internet time, trying to get new devices to market before their competitors do. As a result, hardware companies put both developer training and driver testing on the fast track. Many devices therefore ship without developers' properly writing or adequately testing them. Microsoft endorses an HCL, which lists drivers that undergo certification by a testing lab, but this lab can't possibly test every driver combination. In addition, about 25 million NT systems are online, so even the most obscure bug in a vendor's driver will show up on a regular basis.

Contrast that situation with the Linux situation: Either Linux OS developers or other Linux gurus write drivers for this OS because they love learning its internals and contributing to Linux's acceptance. They have no deadlines for their drivers to be ready for shipment, and the community supports only a limited number of devices because of the limited pool of Linux hackers. Obviously, a disparity in quality will surface between the typical Linux device driver and the typical NT driver. Also, compared to NT, Linux has few device and software combinations, so latent Linux bugs have a smaller chance of surfacing.

NT's stability problems are therefore a byproduct of its widespread acceptance, not of fundamental flaws in NT. If Linux catches on to the extent NT has, Linux will certainly suffer the same trials. I grow increasingly frustrated with the media and the Linux community for ignoring common sense when they bash NT with the reliability cheap shot.

Enterprises are using NT Server for mission-critical applications from messaging to e-commerce. Major e-commerce sites such as Barnes & Noble, Walt Disney, eBay, United Airlines, Delta Air Lines, Dell, Compaq, Gateway, Intel, Microsoft, JCPenney, and CarPoint use NT Server to run their Web sites. Recognizing this trend, enterprise-system vendors—IBM, Unisys, Compaq, Data General, and HP—have committed to provide 99.9 percent uptime on NT Server 4.0. Each vendor is putting its reputation on the line if the customer will pay for the hardware, clustering, systems management software, and services necessary to guarantee such uptime.

How much reliability are customers willing to pay for? Stratus and Marathon Technologies have taken this challenge to an even higher level, providing redundancy for all network components. By guaranteeing up to 99.99 percent uptime, these vendors are reducing unscheduled downtime from 60 hours to 6 hours per year. Going to 99.999 percent uptime achieves less than 1 hour of unscheduled downtime per year. If customers will spend the money, vendors will spend the resources necessary to provide the reliability. The right combination of hardware, software, and service can make an NT system as reliable as you're willing to pay for.

So if drivers are the root of NT's reliability problems, how will Windows 2000 (Win2K) ensure driver reliability? Win2K introduces a driver validation tool, Driver Verifier. A developer will use the verifier to assure a driver's adherence to certain rules as a highly privileged component of the OS. The largest number of driver crashes result from drivers attempting to access pageable memory when the CPU is at an elevated interrupt priority. Such bugs are usually extremely difficult to find in testing, because a crash won't result if the pageable memory that the driver accesses is mapped into the system's physical memory. Developers don't often use memory stress testers during driver testing; but such testers don't necessarily force all the pageable driver code out of physical memory.

Driver Verifier will force all pageable system memory out of physical memory every time the driver being verified raises the interrupt priority. Thus, 100 percent of the time, Win2K will immediately catch an access to pageable data that violates the interrupt priority level rule. Such Driver Verifier features will prevent bad drivers from leaving the vendor's door. Microsoft might establish a testing lab that would exercise drivers via the verifier as a prerequisite to logo certification.

If hardware vendors apply Driver Verifier universally, it will profoundly affect the area that is most often the root of NT's reputation for unreliability: the device driver. We'll be checking Win2K for reliability. If Win2K can shake NT's reliability stigma, Microsoft wins. Otherwise, in the uptime arena, Microsoft will be chasing zeros instead of chasing nines.

End of Article




You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I must reply to Mark Smith’s June editorial, “Chasing 9s.” Surely he doesn’t expect me or any other <i>Windows NT Magazine</i> readers to accept the notion that Windows NT’s stability problems are more a by-product of widespread acceptance and faulty drivers than fundamental flaws in NT. Every month, your magazine addresses concerns associated with NT Server and its components—–both Microsoft and third-party components. Drivers and driver problems don’t seem to consume a lot of real estate in this or other NT-based publications that I read. Sure, drivers are a concern, but let’s keep the problem in perspective. My experience has shown me that as long as you use NT Server as a single-purpose server (e.g., as either a Web, file-and-print, proxy, messaging, or application server), everything is OK. However, if you mix and match server applications, all bets are off. The author indirectly confirms this idea with his example about major e-commerce sites (e.g., Barnes & Noble, Walt Disney, eBay, Dell) that use NT Server for their Web servers. I’m curious about what those companies use for everything else. At least your magazine recognizes that stability issues exist with NT, and you’re willing to provide information to assist us in the field (even if we have different opinions as to the cause). Keep up the good work.<br> --Shawn Downs<br><br>

<i>In the magazine, we try to call it like it is. If a driver is at fault, we put the blame there. If NT is at fault (and it often is), we put the blame there. In either case, we design the magazine to help you work around those problems.<br> --Mark Smith</i>

Shawn Downs

Mark Smith's June editorial, "Chasing 9s," provided an impressive list of companies (e.g., Barnes & Noble, eBay, Dell) that use Windows NT Server for their Web sites. I work with a startup Internet company that began its existence with NT, Microsoft Internet Information Server (IIS) 4.0, and Active Server Pages (ASP). Recently, venture capitalists gained a share of the company and are pushing to move away from NT toward JavaBeans and UNIX. The argument is that NT will fall apart once the company starts getting thousands of Web users per day. I'm proposing that we continue with NT and rely on Microsoft Transaction Server (MTS) and COM to scale up.<br><br>

Where can I get some reliable, unbiased information supporting NT as a robust, scalable Web server environment? Specifically, can I get information that describes how eBay (or other companies) implement their Web sites?<br> --­Steve Perry

Steve Perry

<i>Because of the competitive nature of e-commerce sites, getting companies to open up about how they're using NT to drive their e-commerce engines is difficult. At TechEd, Microsoft demonstrated that you could build a Visual Basic (VB) application using ADO, IIS, and Microsoft SQL Server that supported 7500 simultaneous users. NSTL tested and certified the application.<br><br>

The application demonstrated that not only does NT scale but so do VB and ADO. So, take 7500 users and multiply by 24 hours per day, and you have a lot of users hitting the system. The demonstration system also used Windows Load Balancing Service to add scalability and fault tolerance.<br><br>

The bottom line is that NT is solid and scales well. You can find out more information about scalability at http://www.msdn.microsoft.com/vstudio/downloads/ scale/default.asp.<br> --­Ken Spencer</i>

Ken Spencer

 
 

ADS BY GOOGLE