Any large enterprise that plans to implement a business intelligence or data warehouse solution with 1TB or more of data might find itself looking at a significant investment for a server system that can provide the capacity and throughput that such a database environment requires. Large SAN and server solutions can certainly support large data warehouses, video streaming, or data-mining implementations. But such systems come at a high price. A system that can scan through the database at roughly 700MB per second of sequential I/O might cost anywhere from $500,000 to more than $1 million—and that's for just the hardware, not including the cost of software and maintenance.This enormous price tag led me and my colleague, Son Voba, to look for an alternative server solution.
Voba and I were impressed by the tremendous advances in software and hardware technology over the last 12 to 18 months, and we were inspired by the 2004 SATA disk throughput research of Microsoft Research Scientist Jim Gray and others in Microsoft Research. We wanted to build a prototype server that would consistently deliver more than 2GB per second sequential database reads and writes. Gray's research and testing used the Newisys 4300 server, SATA drives, and SATA disk controllers connected to 48 SATA disks with 48 SATA cables directly attached to the disks. Gray's configuration achieved a remarkable 2.5GB per second for sequential writes and slightly less than that for sequential reads.
We decided to build a white-box server (a non-branded system built from generic components) consisting of a beefy microcomputer with a 64-bit OS with massive main memory, database software, and direct-attached storage. This article describes that system—a prototype server running Windows Server 2003 x64, SQL Server 2005 x64, 32 GB of RAM, and high-capacity, direct-attached 7200 RPM SATA disks. This system, which our team has tested over the past 6 months, can compete with mega-servers often costing ten to twenty times more. Because of its low cost, such a system could become a contender for a big share of the database server solutions market.
The Prototype
In July of 2005, Son Voba, a handful of hardware vendors, and I put together our prototype server in a lab on the Microsoft Redmond Campus. The goal was to test SQL Server 2005 x64 with high-speed, high-capacity direct-attached disk storage by using SATA disk drives and serially attached SAS SCSI disk controllers, SAS expander boards, and cabling.We chose SATA drives because they have a high capacity—up to 500GB per drive today, with larger drives on the horizon. SATA drives are also several times less expensive than SCSI drives based on capacity and are comparable to SCSI drives in sequential I/O bandwidth.This bandwidth is essential for data-warehousing workloads that scan gigabytes of database storage when processing queries or moving large data sets around. I won't attempt to compare the differences in flexibility, high availability, and reliability of SAN storage solutions and direct-attached disk solutions. For excellent explanations of SATA disk drives and SAS protocols, see the materials at the SCSI Trade Association (http://www.scsita.org) and the Serial ATA International Organization (http://www.sata-io.org).
To test our prototype,we used real customer databases, more than 2TB total, from a data-warehouse solution running on SQL Server 2000. In addition, we used a disk-stress utility called SQLIO (SQLIO.exe—available for download at http://www.microsoft.com/downloads) to measure disk throughput.
We built the prototype server during July and August of 2005 using a four-processor dual-core AMD server. Four hardware companies generously loaned us the hardware. Table 1 shows the cost of the components based on market price in December 2005. Newisys (a Sanmina-MCI Company) loaned us the 4300 Server, which came with 32GB of memory and four dual-core AMD Opteron 2.2GHz CPUs. Newsys also provided 64 400GB Seagate SATA disks. LSI Logic Corporation loaned us six SAS3442X Host Bus Adapters (HBAs—disk controllers). Vitesse Semiconductor Company loaned us four SAS expander cards. And Chenbro Micom Company loaned us four 16-bay drive chassis. Figure 1 shows the configuration we created with these components.
We decided to use LSI SAS ( serial-attached SCSI) HBAs and expanders because they are emerging technology that's evolving from the traditional SCSI standard and because SAS lets you mix and match SATA and SAS disk drives connected to the same controller. We plugged the six SAS HBAs into the Newisys server by using four of the PCI-X 133Mhz slots, one of the PCI-X 100Mhz slots, and one 66MHz PCI-X slot.This configuration distributed the six HBAs across the PCI-X buses and the three PCI-X bridges on the motherboard, which Figure 1 shows. This distribution optimized data bandwidth and kept the processors quickly fed with data. We found that SQL Server 2005 could keep up with more data, so the faster the HBAs and disks, the faster DRAM could process queries or data writes.
By using the SAS disk controllers from LSI and SAS expander boards from Vitesse, we were able to use one Molex cable from each LSI HBA in each of the six PCI-X slots to connect to one of the four SAS expander boards,reducing cable clutter to six cables from the 48 that Gray used.The SAS expander let us connect 16 disk drives to one expander. Because we had six HBAs from LSI and only four Vitesse expander boards, we were able to plug the additional Molex cables from disk controllers 5 and 6 into the same Vitesse SAS expanders as disk controllers 3 and 4.
The SAS expander boards let you daisy-chain more expanders, increasing the number of disk drives you can directly attach to a disk controller. However, our initial testing showed that one disk controller can keep up with at most 12 drives when doing sequential scans. Six cards can fully exploit the bandwidth of 72 disk drives for performance but also let you double or triple the number of disks by daisy-chaining the expander boards connected to additional drives.
By zoning the Vitesse expanders, we were able to segregate the disk traffic on disk controllers 3,4,5,and 6 to eight physical disk drives each. The Vitesse SAS expander board basically lets you plug 16 physical SATA drives into the expander board by using Infiniband cables and plug one cable from the expander board into each LSI HBA in the six PCIX slots. In our configuration, each LSI HBA was capable of delivering 400MB per second or more of data throughput—double the current throughput of a typical fiber-channel disk controller used to connect to a SAN.
Once all our hardware and cables were connected, we were able to use SQLIO.exe to demonstrate roughly 2.2GB per second for sequential reads, just over 2GB per second for sequential writes, and more than 24TB of useable disk space. It's important to point out that a direct-attached disk subsystem delivering more than 2GB per second of data throughput is incredibly fast—in many cases, faster than SAN storage systems costing over forty times more than our solution. For some SAN systems, the cost per useable terabyte is close to $20,000. And as you'll recall, the total price for our prototype, including the disks and entire disk subsystem hardware, was roughly $46,100.
Even though our prototype exceeded our expectations, our throughput result was slightly less than the 2.5GB per second sequential I/O throughput that Jim Gray achieved, perhaps because we used SAS protocols. We plan future testing to push for higher throughput. But by using SAS, we were able to daisy-chain more SAS expander boards to the existing four boards we used in the prototype and reduce cable clutter from 48 cables to 6 from the HBAs to the expander boards and drive chassis.The net result is that we could have added a lot more disk drives than the 64 we used—perhaps double, up to 128 total—giving us more than 48TB of useable disk space.