Failover clustering is one of the most popular high-availability options for
SQL Server. If you're about to implement failover clustering in your environment,
you need to devote a lot of planning and coordination to all aspects of your
cluster design. You can't just draw up the plan for your cluster on a cocktail
napkin. Fixing a bad design after the fact can translate to significant downtime,
and downtime defeats the entire purpose of implementing highly available clusters.
You'll need to work with your fellow Windows, storage, and network engineers
to properly implement your failover clusters—it's a team effort.
One of the most important aspects to get right when configuring your clustered
instances of SQL Server is the disk configuration. It isn't a simple prospect,
and I often see it done incorrectly in client environments. To address this
important topic, I have three avenues for you to take. First, this article provides
a concise primer to fuel discussions about your failover clustering implementations—both
new and existing. This article is written with SQL Server 2005 in mind, but
most (if not all) of the concepts apply to previous versions of SQL Server as
well. Second, if you need a quick failover clustering overview, see the Microsoft
white paper I co-wrote, "SQL Server 2005 Failover Clustering," at the Microsoft
Download Center (http://www.microsoft.com/ downloads). Finally, if you want
to read about failover clustering in great detail, you can check out my new
book Pro SQL Server 2005 High Availability (Apress, 2007).
Supportable Cluster Solutions
Before you even install Windows on the server, it's an absolute requirement
that your entire cluster solution—down to the disk solution, host bus
adapters (HBAs), and drivers—appear in the Windows Server Catalog of
Tested Products list (http://www.windowsservercatalog.com)
as a valid cluster solution. This requirement is clearly defined in the Microsoft
articles "The Microsoft SQL Server support policy for Microsoft Clustering"
(http://support.microsoft.com/kb/327518) and "The Microsoft support policy for
server clusters, the Hardware Compatibility List, and the Windows Server Catalog"
(http://support.microsoft.com/kb/309395).
Failing to deploy a certified cluster tends to lead to downtime, and having
the wrong drivers, BIOS, or firmware can lead to other problems such as disk
corruption. Always check the Windows Server Catalog or your vendor's support
matrix for clustered solutions to see what is supported for a Microsoft cluster.
Just because a newer HBA driver is available doesn't mean that you should update
that driver on the server. Make sure that you have known good backups that are
recent and tested, should you encounter a catastrophic disk problem such as
corruption resulting from a storage engineer introducing a problem with a new
driver.
Disk Configuration Basics
Microsoft's implementation of disks in a cluster is a shared nothing
approach. Although any node might eventually be able to own a disk resource,
only one node can own it at any given time, and it can only be used by the resources
in a single cluster group. A disk resource isn't shared nor can it be used by
other resources outside the cluster group or the node owning it. Other vendors
have implemented a form of clustering that uses a shared disk subsystem, which
lets more than one resource access the same disk simultaneously, but it involves
a piece of code called the lock manager for managing access to the disk resource
to ensure that no conflicts occur.
As you plan your disk configuration, you
must understand the difference between
logical and physical disk configuration.
Unfortunately, I encounter many DBAs who
don't understand this point. For Windows to
use a disk on a storage array, a Logical Unit
Number (LUN) is defined on the physical
array. The LUN is a grouping of disks
achieved through some form of RAID.
I won't discuss the various RAID flavors here, but you can find information in
"Know Your RAID Levels" (InstantDoc ID
9697). Some hardware vendors implement
proprietary versions of RAID on certain
arrays, so sometimes you not only won't
have control over its configuration but you
also won't be able to abide by typical best
practices. A LUN must be low-level formatted before it's presented to Windows. To
format a LUN, you can use vendor-specific
tools from the hardware manufacturer. The
formatting typically occurs at the time your
disk array is set up, so be sure to work closely
with the engineers who set up the array to
ensure that the formatting occurs properly.
Once the LUN is ready to be presented to Windows, you need to make the disk
usable in Windows. This process involves formatting the disk and possibly assigning
it a drive letter, depending on how you plan to use the disk. Formatting the
disk in Windows is a completely separate process from the earlier low-level
format, which the array itself required. After you format the disk, your logical
disk will be ready to use. Disk formatting in Windows is a subject that DBAs
need to be vigilant about. Many storage or Windows engineers aren't familiar
with SQL Server, so they wind up both low-level formatting the disks at the
disk subsystem level and formatting in Windows with the default settings. Or,
they're more familiar with Oracle and assume that SQL Server is the same thing.
The default block size in Windows is 4KB. That size might be fine for a file
system—but not SQL Server data files. SQL Server writes are 8KB and readahead
is 64KB. It's OK to format transaction log disks with 4KB—if they'll
never contain a data file. I recommend formatting the disks with 64KB. You could
use a higher block size than 64KB, but you might not realize any benefit. I
always recommend playing it safe. If applicable, perform a sector alignment
of the disks in Windows before formatting them. You could potentially see as
much as a 20 percent performance gain. Some disk subsystems don't need sector
alignment, so check with your vendor's recommendations.
When formatting and defining your disks in Windows for use on a cluster, don't
define two partitions and drive letters on a single LUN, as you see in Figure
1. As you can see, to Windows, disks I and J are two logical disks that
happen to be carved out on one LUN. However, when the disk is added to the Windows
server cluster, it's added as one disk, as you see in Figure
2. Cluster Administrator recognizes it only as one big disk that happens
to have two drive letters. You couldn't have two separate SQL Server instances
sharing the drive because it can reside in only one cluster group. You might
as well have just used one drive letter.
The SQL Server installation process requires that you choose a cluster group
into which all clustered resources will be placed. Multiple SQL Server or SQL
Server Analysis Services (SSAS) installations can't share a cluster group, meaning
that any disks in a cluster group can be used only by a single instance of SQL
Server or SSAS. They can't be shared. If you have more than one instance that
you're planning on adding to a Windows server cluster, you'll need dedicated
disks for each.
All disks for a clustered installation must be on the shared disk subsystem.
I'm frequently asked if, in a clustered installation, a local disk can be used
for things such as the system databases (especially tempdb) or backups. The
answer is no.
Prev. page  
[1]
2
next page