SATA, SCSI, and the SQL Server Microwave
I read Douglas McDowell's Web-exclusive SQL Server Perspectives article "Are
You Ready for the 'SQL Server Microwave'?" (March 2, 2006, InstantDoc ID 49561).
Yes, Serial ATA (SATA) drives sound wonderful, but is there a downside? I think
maybe there is. Consider that the duty cycle for SATA drives is somewhere around
30 percent, versus the 100 percent for typical SCSI and Fibre Channel drives.
SATA drives were designed for nearline applications, where a request comes in
every so often. They're simply not designed to take a constant pounding, such
as they would in a large enterprise-class online transaction processing (OLTP)
or business intelligence (BI) application. In line with the lower duty cycles
comes a penalty in mean time between failures (MTBF)—SATA drives are a
lot more prone to failure than SCSI drives are.
I recently read an article that said iSCSI has caught up with SANs in capacity
and performance, ignoring the fact that SANs have recently moved to 4Gbps end-to-end.
Sometimes in our enthusiasm for a new product or technology, we forget to look
at all facets of the situation.
- Bob Binkert
You bring up some solid concerns, Bob, but I think the original market and
current market for SATA are very different—and the top hard-drive vendors,
including IBM, would disagree with limitations you are suggesting for SATA.
There is also a bit of a "disposable technology" factor that must be considered
as well, since SATA is much cheaper than SCSI. I went ahead and got Rich Johnson's
thoughts on your concerns. According to Rich,"SATA drives are helping drive
down the cost of traditional SCSI drives and SAS drives with similar capacity.
I'm testing 300GB SCSI drives attached to a prototype server almost like the
one in "Are You Ready for the 'SQL Server Microwave'?" In some testing, we're
seeing a failure rate of only 1 to 3 percent more than SCSI.We mirror everything,
and given the cost of SATA drives versus SCSI, as soon as a mirrored drive goes
bad you hot-swap another in. It takes perhaps one to two hours to remirror a
400GB SATA drive."
- Douglas McDowell
A Primary Key for Every Table?
Can Michelle Poolet answer a question that my workmates and I have been arguing
about? Some of us say that every relational database table should have a primary
key,regardless of whether it links to any other table or whether the primary
key is used for referential integrity. Others say that we don't absolutely have
to have one.We have a set of five fields that we would use as the primary key
but can't because some of those fields hold nulls. These five fields have been
set up as an index with a unique constraint and will prevent us from creating
duplicate rows. Do we need to add a column called id that is a primary key in
name only?
- John Wells
Every relational table should have a primary key. However, no relations
(SQL) database that I know of enforces that rule—it's simply a best practice.
A primary key serves more purposes that just acting as an"anchor"for referential
integrity.It's sometimes the only way to uniquely identify a row from its counterparts
in a table. It's a very efficient and very effective way of isolating rows in
a query.And in a case like the one you've described, the primary key prevents
duplicate rows from being inserted into a table.
Here's what I think is the most obvious drawback with a five-column unique identifier.When you delete a row from this table, your DELETE query must look like:
DELETE FROM my_table
WHERE colA = 'blah'
AND colB = 'blah'
AND colC IS NULL
AND colD IS NOT NULL
AND colE = 'blah'
or something similar. Wouldn't it be much easier to simply say:
DELETE FROM my_table
WHERE pkey_col = 123
The UPDATE and SELECT queries will have a similar construct whenever you need to isolate a single row from the table.
Then there's the issue of index storage and I/O. As you probably know, the index entries (the index key) will be the concatenation of the five columns. Having to traverse these long strings in the B-tree to get to the leaf level "bookmark" (if it's a nonclustered index) or to the data page (if it's a clustered index) is a hefty assignment compared to having to traverse simple 4-byte integer values (i.e., your typical identity/pkey data type). If your current index key is 40 bytes, you're able to store one-tenth the number of index keys per page with your 40-byte key, as compared with a 4-byte identifier.
Part of the answer might come down to the surrogate primary key versus the
natural primary key. From an operational perspective, the identity, or surrogate,
primary key is more efficient and effective; from a user perspective, the natural
key is more meaningful. Use the surrogate primary key internally, behind the
scenes. Use the natural key, which you've constrained with a unique index, for
your user interfaces.
- Michelle A. Poolet
End of Article