• subscribe
March 10, 2009 12:00 AM

Configuring Storage for Your SQL Server Environment

Tuning your storage resources can save you time and improve SQL Server performance
SQL Server Pro
InstantDoc ID #101672

Executive Summary:
Proper configuration of your storage resources is essential to getting the best performance out of your SQL Server systems. Learn how to calculate the total amount of space your database requires during your data retention period, determine if you should use RAID 5 or RAID 10, tune your cache, and tune your LUN and align the disk. Also, find out the differences between a SAN and a DAS, and see if iSCSI is right for your database environment.

A task that’s frequently skipped when setting up a new SQL Server or application is proper capacity planning. With storage costs falling every quarter, it’s easy to tell your storage administrator that you need 500GB of storage. But what have you really asked for? Will 500GB of storage last six months, three years, or 10 years? Without the answer to this question, you might not be asking for the right thing. Also, you need to consider the speed of your storage. Some storage is more expensive than other storage. Although your organization might not charge back when storage is requested, as a member of the IT staff it’s your responsibility to ensure the most effective use of the department’s resources. Let’s look at how to determine how much storage you need, how to configure your disks, and the types of storage available.

Capacity Planning
The easiest part of capacity planning is determining how much storage space your database requires. You need to figure out the total number of bytes that you will need to use over n months. Although calculating this number might appear to be a rather daunting task, when you break it down to each table it’s not as scary. First, add up the size of all the static lookup tables. The size of these tables won’t change from month-to-month, so once you have that value you can just add it onto the end. For the tables that will be growing in size every day/month/year, you need to know a few key pieces of information. You need to know the average number of rows to be added per cycle. A cycle could be a day, a month, or a year—however often the data will be coming into the system. If it’s a real-time system, such as an order entry system, ticketing system, or network security system, daily is probably the best assumption. You also need to know the average length of each row. The developers can probably help you answer this question. You’ll also want to find out what your organization’s record retention policy is. Do you plan to keep data for a month, a year, or forever? You also need static numbers; in this case, the total number of bytes per data page on the disk is 8060.

From here some basic arithmetic will provide you with the total number of bytes. For this example, we’ll assume a daily cycle of 20,000 rows, with an average row length of 187 bytes, and a corporate record retention period of three years. The first calculation that you need to do is to figure out how many rows fit into each data page. This calculation is important because SQL Server can’t split a row between two data pages. Next, take the number of bytes per page and divide that by the number of bytes per row (8060/187), which gives you 43.101 rows per data page, as shown in Table 1.

You’ll want to round this number down because SQL doesn’t page split, which gives you 43 rows per data page. Then take the number of rows per (daily) cycle and divide that by the number of rows per data page (20000/43), which gives you 465.11 as number of data pages you need for the table, as shown in Table 2.

Because you can’t have a part of a data page, round this number up to 466 data pages. Although each data page can hold only 8060 bytes, the data page itself is actually 8KB in size. Now take 466 data pages and multiply that by 8KB to get the total amount of data added to the table each day, which is 3728 KB or 3.64MB, as shown in Table 3.

If you stretch this number out to a month (30.5 days on average), you get 111.02MB per month. Now multiply that number by the retention period (36 months) to get 3996.72MB or 3.9GB over three years, as Table 4 shows.

After you’ve determined how much space each table requires, add the numbers up to get the total space needed for the database over your retention period. If your retention period includes moving data to an archive database for reporting purposes, your actual data sizes will be smaller because some data is being removed from the tables. After figuring out how much space your data requires, you then need to think about the speed and redundancy of the disks that will be hosting your data.

Configuring Your RAID
On paper, everyone says to use RAID 10 for your database. It has the fastest write response and there’s no parity calculation overhead. However, when people typically talk about storage they forget one important thing: cost. Although RAID 10 will get you the best performance, it’s an extremely expensive RAID level to use. RAID 10 takes twice as many disks to configure, and those disks aren’t free. RAID 10 should be used in some cases, but deploying it to all your databases by default is typically an extremely inefficient use of your storage resources. The majority of databases that are configured for RAID 10 storage would probably work just fine on RAID 5 storage.

It’s important to understand the difference between RAID 5 and RAID 10 before deciding which RAID level to use. RAID 5 is also called a stripe set with parity. As the data is written to the disk, a parity calculation is done on each data page. This parity calculation is written to the disk with the data page and used by the array in the event of a disk failure to let the array rebuild the data when the failed disk is replaced. Some SAN vendors will place all the parity information on a single disk, and some will spread the parity information across all disks in the RAID array. Although both techniques will give you the same level of protection, having the parity spread across all the disks in the array will increase the speed of the array because the additional disk can be used for reads and writes.



ARTICLE TOOLS

Comments
  • Denny
    3 years ago
    Jun 09, 2009

    dbadvisor,
    When I wrote up this article I used our EMC SAN as my reference. Because of the intelligent prefailing of drives in the array I do forget that there are fault tolerances differences between the two.

    daletrotman,
    This is correct, Windows 2008 does now align the disks at 128 instead of the 63 which Windows 2003 and prior have done. So technically you don't need to change the alignment as with the disk aligned at 128 will have the same performance as 64. But with the other features of diskpart that are so useful when working with a SAN such as extending disks online that it is important to mention it.

    MarsonGalvani,
    Thank you.

  • Lori
    3 years ago
    May 12, 2009

    The description of RAID 10 is not correct. You have described RAID 0+1, striping, then mirroring. RAID 1+0 (often called RAID 10) is mirroring, then striping. These are very different configurations. RAID 1+0 is far more fault tolerant than RAID 0+1.

  • dale
    3 years ago
    May 06, 2009

    Great article. I have one point though. You mentioned using the DISKPART.EXE command-line tool in Windows Server 2008 and Windows Server 2003 to align the partition. Is this not a standard feature in Windows Server 2008 and therefore not required? I make this point as we are just building a SAN in our company and do not intend to make this change. Should we still use the DISKPART.EXE tool do you think?

  • Marcos
    3 years ago
    May 05, 2009

    Very good article. Thanks for the tips.

You must log on before posting a comment.

Are you a new visitor? Register Here