• subscribe
February 19, 2003 12:00 AM

Text in Row Internals

Learn the ins and outs of text data
SQL Server Pro
InstantDoc ID #37635
Downloads
37635.zip

After SQL Server generates the error message that reports the invalid text pointer, it continues with the batch and commits the transaction, so the new column is part of the table. If you try to run the same transaction again, the ALTER TABLE statement generates an error because you can't add a second column of the same name. The error on ALTER TABLE terminates the batch, so the COMMIT never executes and you're left with an open transaction. By testing the value of @@trancount, you can check for an open transaction after the error is reported.

Another way you can invalidate a text pointer for a table that has text in row enabled is by using a special stored procedure, sp_invalidate_textptr, which takes a text pointer as an argument. You can verify this behavior for yourself by running Listing 1's code again, replacing the line containing ALTER TABLE with this one:

EXEC sp_invalidate_textptr @ptrval

Looking at Text Data
Besides knowing how to manipulate the LOB data and the text pointers, for database management and sizing purposes, you also need to understand where the LOB data is stored. In "Off the Record" (January 2003, InstantDoc ID 26997), I showed a simple way of verifying when SQL Server actually moved text data onto a data page after the text in row option was enabled. You use the sp_spaceused stored procedure and note when the number of pages reserved for the table changes. When the text data is on the special LOB page, SQL Server reserves more pages for a table than after you move the text data into the row. However, that technique of looking at the number of reserved pages is useful only for the simple case of a table containing just one row. In a table with multiple rows, if some LOB data is in the data rows because it's small enough and the LOB data for some rows is on its own special pages, the sp_spaceused procedure might not give any useful information because it shows only the total number of pages, not any information about what's on those pages. However, you can use the undocumented command DBCC PAGE to look at the data on the LOB pages themselves.

I won't go into complete detail about the workings of DBCC PAGE, but you can read my articles "More Fill-Factor Mysteries" (May 2001, InstantDoc ID 20242) and "Do the Splits" (June 2001, InstantDoc ID 20589) for more information. This command isn't an official part of the SQL Server documentation, but using the undocumented command is your only alternative if you want to view the storage structures within SQL Server.

For an example of how to look at the data on LOB pages, use the script that Listing 2 shows to build a table called hasText with five columns, one of which is a text data type. The code inserts one row into the table, putting a string of 250 Xs in the text column. The code then uses DBCC IND to discover which pages belong to the table. DBCC IND returns four rows, representing four pages in the table. The DBCC IND output has two rows for pages with a PageType value of 10, which give details about allocation pages; I'll ignore those. The page with PageType 1 is a data page; for my hasText table, the data page is page 284. The page with PageType 3 is the LOB data page (for my table, it's page 282). If you run this script, you'll have to replace the 284 and the 282 with your own page numbers. Also, if you don't use the Pubs database, you'll have to change the first argument to be the database ID of the database you're using.

Next, the script in Listing 2 enables the text in row option with a maximum of 500 bytes, then uses the OBJECTPROPERTY() function to verify this limit. The code enables DBCC PAGE to return output to the screen by enabling trace flag 3604, then uses DBCC PAGE with database ID 5 for the Pubs database to display both pages—the regular data page and the LOB data page.

In the DBCC PAGE output, SQL Server attempts to translate any character data on the page into readable ASCII format. You can see the contents of the row on the data page after the header that says Record Type = PRIMARY_RECORD. For now, let's ignore the cryptic byte output on the page and just look at the translation of the bytes on the far right. You can see the other character values that the code inserted into the row, but not the string of Xs. To see this string, you have to scroll down to where the LOB page's information begins, right after the header that says Record Type = BLOB_FRAGMENT. Again, just looking at the information on the far right, you can see the long string of Xs. Because the code enabled text in row after inserting the data into the table, the data doesn't automatically move into the row.

The script in Listing 2 then updates the row so that the string of Xs is now a string of Zs. The update causes the text data to move onto the data page. Running DBCC PAGE again for both the data and the LOB pages shows you that the string of Zs is on the data page. It also shows that SQL Server first updated the data on the LOB page; the Zs are on the LOB page. However, because this was the only row on the page, SQL Server deallocated the LOB page. The information is still on the page, but it no longer belongs to the hasText table. The indicator that this page isn't allocated is in the Allocation Status section of the page header. For the data page, I have this information: PFS (1:1) = 0x61 MIXED_EXT ALLOCATED. For the LOB page, I have something similar, but the word ALLOCATED is missing, which means the page isn't allocated to any table.

You can do more testing of LOB data behavior on your own. Use DBCC PAGE to see what happens when some of your text data is longer than the text in row limit and some is shorter or when you have text data that's too big to fit on one page. You can disable the text in row option and see what happens to the pages in this table.

LOB data can be useful for storing large chunks of data, and having the ability to store smaller pieces of LOB data in the table makes using these data types much more efficient than in versions earlier than SQL Server 2000. Understanding how to work with LOB data types and how they behave inside and outside of transactions can help you make the best decisions about how to store your LOB data and how to write efficient applications. And being able to examine the internals of the pages helps you understand what SQL Server does when you manipulate data of one of these special data types.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here