• subscribe
May 23, 2001 12:00 AM

Do the Splits

SQL Server Pro
InstantDoc ID #20589
Downloads
20589.zip

This new row needs to go right after the four original large rows, but it completely fills the page, so the table no longer has room for the four small rows. When you examine the original page after the insert takes place, you'll see only the five large rows. The page header now contains a nonzero value for m_nextPage in the format file_number:page_number, meaning that SQL Server added a new page after the original. For example, 1:224 means file 1, page 224 (decimal). If you use DBCC PAGE to examine the new page, you'll see that it contains the original four small rows.

In this case, the first few rows stayed on the original page, and SQL Server linked a new page in after the original. Now use the code in Listing 2 to change the table so that the small rows are before the large rows. You get slightly different behavior if you insert a new large row between the small rows and the large rows:

INSERT INTO splitrows 
    VALUES (22, REPLICATE('x', 1600)) 
GO

Using the following command to check the page data

DBCC PAGE(pubs, 1, 215, 1, 1)
GO

shows that the original page (215) now has the original large rows with keys 25, 30, 35, and 40, but it doesn't have the new large row with key 22. The page header shows no pages following the original page (m_nextPage is 0), but a previous page exists. Using DBCC PAGE to look at the contents of the page that m_prevPage points to, you'll find the original four small rows and the new large row. In this case, SQL Server added the new page that it allocated for the split before the table's original full page.

How It Works
In both of the previous examples, you inserted a new row between the original large rows and the original small rows. But what happens if you insert a new large row between two of the small ones? For example, what happens if you use the code in Listing 1 to create the original table, then insert a large row with a clustered key value of 33? In that case, the four large rows and one small row stay on the original page; the new page has one small row, followed by the new large row, followed by the last two original small rows. For a new row with a clustered key value of 37, I found that two of the original small rows stay on the original page, and the new page contains one small row, the new large row, then the last original small row.

SQL Server decides where to link the new row into the table based on an estimate of the amount of data it will have to move, and it tries to minimize the logging overhead. So when the first few rows on a page are smaller, SQL Server moves them to the new page, which it adds before the original page. If the larger rows are at the beginning of the page, they stay on that page, and the smaller rows, which have less data to log, are the ones that move.

The algorithm for determining exactly where the split occurs isn't easily quantifiable. But it seems to work something like this: After SQL Server decides whether to add the page before or after the original page, it determines where the new row will go in the ordered sequence, then it decides where to make the split. For example, the first table had keys 5, 10, 15, and 20 as large rows and 25, 30, 35, and 40 as small rows. Then I inserted key 22 as a large row, but the page had no room for this new row. So the keys needed to be in the sequence 5, 10, 15, 20, 22, 25, 30, 35, 40. SQL Server placed the first row, 5, on the first page, and the last row, 40, on the second page. SQL Server then alternated back and forth, taking one row for the first page (in this case, the next row has key 10) and one row for the second page (key 35). When one page became full, all the remaining rows went on the other, non-full page. That explanation isn't perfect, and some cases appear to have additional variations, but it seems close to an explanation of how SQL Server determines the split point.

Double Splits
Finally, let me show you an example of a different type of split. If column b in my example table had a maximum length of 8000 characters, I could insert a new row such that no two-way split would accommodate all the rows in order. The code in Listing 3 creates a table only slightly different from the previous two, then inserts exactly the same rows. The table originally has no rows containing 8000 bytes in column b. If you then insert a row with an 8000-byte column immediately after the first large row, SQL Server can't accommodate the insert with just one split:

INSERT INTO splitrows 
    VALUES (7, REPLICATE('x', 8000)) 
GO

Using DBCC PAGE to examine the original page shows that the page now contains the last three original large rows and the four small rows. Its previous page contains only the new extra-large row, and the page before that one holds the first original large row. If possible, SQL Server tries to avoid having to allocate two new pages. But sometimes, as in this example, a double split is unavoidable because SQL Server needs to maintain the row order in the clustered index.

Typically, a page split isn't too expensive if it occurs once, but if you get hundreds of page splits a minute, you'll want to reduce this frequency in your production system, at least during peak usage times. Using the Performance Monitor counter Page Splits/sec, which you can find under the SQLServer:Access Methods object, you can monitor the number of page splits per second to see whether it increases over time or starts out high immediately after you create a clustered index. To avoid system disruption during busy times, you can reserve some space on pages by using the FILLFACTOR clause when you create a clustered index on existing data.

You can use the FILLFACTOR clause to your advantage during your least busy operational hours by periodically recreating the clustered index with a FILLFACTOR value that leaves enough room on your data pages to minimize splitting until you have time to rebuild the index. If your system has no slow times, you can use DBCC INDEXDEFRAG (SQL Server 2000 only) to reorganize an index's pages and readjust the FILLFACTOR without making the entire table unavailable. For more information about fragmentation and defragmenting, see "Keep SQL Server Up and Running," December 2000.



ARTICLE TOOLS

Comments
  • THOMAS
    7 years ago
    Jul 18, 2005

    As always Kalen supplies premium content

  • Ali baba_SQL
    8 years ago
    Aug 27, 2004

    very useful

You must log on before posting a comment.

Are you a new visitor? Register Here