Find out how often SQL Server is splitting your data pages
In "More Fill-Factor Mysteries," May 2001, I demonstrated that SQL Server performs a page split when it needs to insert a row on a full page. To determine whether a split occurred, I examined the output from the DBCC IND command, which shows how many pages are in a table. If the number of pages increases after you insert one row, you know that a split occurred.
Page splitting raises many interesting questions, such as, When a split occurs, which rows does SQL Server move to the new page and which ones stay on the original page? You might also want to know where the split occurs; does SQL Server split the page in half?
If so, does "half" mean half the number of rows or half the number of bytes? Before I show you some techniques you can use to discover the answers to these questions for yourself, let's examine the DBCC PAGE command in more detail.
Splitting the Data Page
Page splits can occur anywhere SQL Server needs to keep data in an ordered sequence, so pages can split at any level of a clustered or a nonclustered index. A data-page split is the most common type of split, and it's probably the only type that you need to be concerned with.
Data-page splits introduce most of the fragmentation that can slow down an ordered data scan. (Ordered scans occur whenever you need to access a range of data in the order of the clustered index key.) Because the leaf level of a clustered index is the data itself, SQL Server must keep the data rows in sorted order according to which table columns make up the index keys. If no clustered index exists, SQL Server places a newly inserted row on any page that has room for that row. Therefore, data pages split only when a clustered index exists on the table and only as a result of insert activity.
Although only insert activity causes splits, that activity can result from either an UPDATE statement or an INSERT statement. If an updated row can't remain on the same page after the update, SQL Server performs the update operation as a delete of the original row followed by an insert of the new version of the row. The insertion of the new row might then cause a page split.
Splitting a data page is a complicated operation. When a new row needs to go on a certain page but that page doesn't have room, SQL Server allocates a new page. SQL Server must then determine whether to link in the newly allocated page before the full page and move rows from the beginning of the full page to the new page, or to link in the new page after the full page and move pages from the end of the full page to the new one.
Let's look at what happens to a page when it splits. The code in Listing 1 creates a table containing both large and small rows. The first column is the primary key on which the code builds a clustered index. The value in this column determines a row's position on a data page. The second column is a variable-length character column. If filled to their maximum length of 1600 bytes, only five of these rows will fit on a page.
After creating the table and populating it with four large and four small rows, you can find its first page by looking at the column called first in the sysindexes table.
SELECT first FROM sysindexes WHERE id = OBJECT_ID ('splitrows')
You have to convert the address of the table's first page from a hexadecimal to a decimal value, then use DBCC PAGE to examine the page's contents. For instructions about performing this conversion, see the sidebar "Using DBCC PAGE," March 2000, page 57.
To run DBCC PAGE, you must first turn on traceflag 3604, which directs SQL Server to send all special DBCC output to the client applicationin this case, Query Analyzer. If the value in the sysindexes.first column for the splitrows table were 215 decimal, you could run the following commands to see the data on the page:
DBCC TRACEON(3604)
GO
DBCC PAGE(pubs, 1, 215, 1, 1)
GO
In the DBCC PAGE output, character columns are the easiest type to read because SQL Server displays them in human-readable format. Because the page contains more than 8000 bytes of data, I won't show you the output. But if you scan through the right side of the output, you'll see the character data in column b. The large rows take up about 100 lines of output; the small rows take up about 2 lines.
In the page header, note the two values for the previous and the next pages. Because this table contains only one page, both m_prevPage and m_nextPage show a 0, meaning that no other pages exist.
Now let's insert another large row and find out what happens to the page:
INSERT INTO splitrows
VALUES (22, REPLICATE('x', 1600))
GO
DBCC PAGE(pubs, 1, 215, 1, 1)
GO
Prev. page  
[1]
2
next page