Failover clustering is one of the most popular high-availability options for SQL Server. If you're about to implement failover clustering in your environment, you need to devote a lot of planning and coordination to all aspects of your cluster design. You can't just draw up the plan for your cluster on a cocktail napkin. Fixing a bad design after the fact can translate to significant downtime, and downtime defeats the entire purpose of implementing highly available clusters. You'll need to work with your fellow Windows, storage, and network engineers to properly implement your failover clusters—it's a team effort.

One of the most important aspects to get right when configuring your clustered instances of SQL Server is the disk configuration. It isn't a simple prospect, and I often see it done incorrectly in client environments. To address this important topic, I have three avenues for you to take. First, this article provides a concise primer to fuel discussions about your failover clustering implementations—both new and existing. This article is written with SQL Server 2005 in mind, but most (if not all) of the concepts apply to previous versions of SQL Server as well. Second, if you need a quick failover clustering overview, see the Microsoft white paper I co-wrote, "SQL Server 2005 Failover Clustering," at the Microsoft Download Center (http://www.microsoft.com/ downloads). Finally, if you want to read about failover clustering in great detail, you can check out my new book Pro SQL Server 2005 High Availability (Apress, 2007).

Supportable Cluster Solutions
Before you even install Windows on the server, it's an absolute requirement that your entire cluster solution—down to the disk solution, host bus adapters (HBAs), and drivers—appear in the Windows Server Catalog of Tested Products list (http://www.windowsservercatalog.com) as a valid cluster solution. This requirement is clearly defined in the Microsoft articles "The Microsoft SQL Server support policy for Microsoft Clustering" (http://support.microsoft.com/kb/327518) and "The Microsoft support policy for server clusters, the Hardware Compatibility List, and the Windows Server Catalog" (http://support.microsoft.com/kb/309395).

Failing to deploy a certified cluster tends to lead to downtime, and having the wrong drivers, BIOS, or firmware can lead to other problems such as disk corruption. Always check the Windows Server Catalog or your vendor's support matrix for clustered solutions to see what is supported for a Microsoft cluster. Just because a newer HBA driver is available doesn't mean that you should update that driver on the server. Make sure that you have known good backups that are recent and tested, should you encounter a catastrophic disk problem such as corruption resulting from a storage engineer introducing a problem with a new driver.

Disk Configuration Basics
Microsoft's implementation of disks in a cluster is a shared nothing approach. Although any node might eventually be able to own a disk resource, only one node can own it at any given time, and it can only be used by the resources in a single cluster group. A disk resource isn't shared nor can it be used by other resources outside the cluster group or the node owning it. Other vendors have implemented a form of clustering that uses a shared disk subsystem, which lets more than one resource access the same disk simultaneously, but it involves a piece of code called the lock manager for managing access to the disk resource to ensure that no conflicts occur.

As you plan your disk configuration, you must understand the difference between logical and physical disk configuration. Unfortunately, I encounter many DBAs who don't understand this point. For Windows to use a disk on a storage array, a Logical Unit Number (LUN) is defined on the physical array. The LUN is a grouping of disks achieved through some form of RAID.

I won't discuss the various RAID flavors here, but you can find information in "Know Your RAID Levels" (InstantDoc ID 9697). Some hardware vendors implement proprietary versions of RAID on certain arrays, so sometimes you not only won't have control over its configuration but you also won't be able to abide by typical best practices. A LUN must be low-level formatted before it's presented to Windows. To format a LUN, you can use vendor-specific tools from the hardware manufacturer. The formatting typically occurs at the time your disk array is set up, so be sure to work closely with the engineers who set up the array to ensure that the formatting occurs properly.

Once the LUN is ready to be presented to Windows, you need to make the disk usable in Windows. This process involves formatting the disk and possibly assigning it a drive letter, depending on how you plan to use the disk. Formatting the disk in Windows is a completely separate process from the earlier low-level format, which the array itself required. After you format the disk, your logical disk will be ready to use. Disk formatting in Windows is a subject that DBAs need to be vigilant about. Many storage or Windows engineers aren't familiar with SQL Server, so they wind up both low-level formatting the disks at the disk subsystem level and formatting in Windows with the default settings. Or, they're more familiar with Oracle and assume that SQL Server is the same thing.

The default block size in Windows is 4KB. That size might be fine for a file system—but not SQL Server data files. SQL Server writes are 8KB and readahead is 64KB. It's OK to format transaction log disks with 4KB—if they'll never contain a data file. I recommend formatting the disks with 64KB. You could use a higher block size than 64KB, but you might not realize any benefit. I always recommend playing it safe. If applicable, perform a sector alignment of the disks in Windows before formatting them. You could potentially see as much as a 20 percent performance gain. Some disk subsystems don't need sector alignment, so check with your vendor's recommendations.

When formatting and defining your disks in Windows for use on a cluster, don't define two partitions and drive letters on a single LUN, as you see in Figure 1. As you can see, to Windows, disks I and J are two logical disks that happen to be carved out on one LUN. However, when the disk is added to the Windows server cluster, it's added as one disk, as you see in Figure 2. Cluster Administrator recognizes it only as one big disk that happens to have two drive letters. You couldn't have two separate SQL Server instances sharing the drive because it can reside in only one cluster group. You might as well have just used one drive letter.

The SQL Server installation process requires that you choose a cluster group into which all clustered resources will be placed. Multiple SQL Server or SQL Server Analysis Services (SSAS) installations can't share a cluster group, meaning that any disks in a cluster group can be used only by a single instance of SQL Server or SSAS. They can't be shared. If you have more than one instance that you're planning on adding to a Windows server cluster, you'll need dedicated disks for each.

All disks for a clustered installation must be on the shared disk subsystem. I'm frequently asked if, in a clustered installation, a local disk can be used for things such as the system databases (especially tempdb) or backups. The answer is no.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.