DOWNLOAD THE CODE:
Download the Code 26997.zip

When you're deciding the best mechanisms for storing your user data, you often need to find a way to store large data objects. These objects could be character strings representing textual content (for example, book chapters), or they could be binary data storing bitmap images, sound, or video files. SQL Server releases before 7.0 limit the size of a data object in a SQL Server table to 255 bytes. With SQL Server 7.0, the limit increases to 8000 bytes. All releases can store larger data objects by using the special data types text and image, which let you store as much as 2GB of data outside the table on a set of special pages. Let's look at how SQL Server stores these special data types and discuss some factors that can affect your decision about whether to implement these data types. Finally, we'll look at a special mechanism in SQL Server 2000 that lets you store text data in the table when it will fit.

For this article, I refer to large data objects as LOBs. As of SQL Server 7.0, the LOB types are text, ntext, and image. The text data type contains ASCII characters, ntext contains Unicode characters, and image contains binary data. Images are sometimes called BLOBs (binary large objects).

When you include a column of LOB data in a table, the sp_help output for the table shows that the LOB data takes up only 16 bytes. Those 16 bytes hold a pointer to a location inside SQL Server—but outside the table—where the LOB data is stored. If you have more than one LOB column in the table, each row will contain multiple 16-byte pointers. In SQL Server 6.5, the LOB pointer is the address of a 2KB page that's the start of a linked list of pages (if more than one page is needed). Even if you need only 1 bit more space than the 255-bit maximum size, SQL Server 6.5 allocates a whole 2KB page and accesses that page in a separate read operation any time it needs the LOB data. If you have thousands or millions of bytes of data, the wasted storage isn't the most important problem, but the efficiency of finding data is limited. Although SQL Server 6.5 allows a substring search on LOB data, it has to follow each page chain to the end to determine whether the row contains what it's looking for. Searching for and replacing data in the middle of a LOB value is also problematic because there's no easy way to get to the middle of the data.

SQL Server 7.0 overcame some of these limitations by completely changing the way it organizes LOB data. One enhancement increased the maximum length of character or binary fields to 8000 bytes. (Note that for the ntext data type, you get 4000 characters because each Unicode character requires 2 bytes for storage.) When you need more than the maximum in-row data, you can still use the LOB types. Another improvement is that instead of storing the LOB data for each row in a separate linked list of pages, each table has only one collection of special pages to hold all the LOB data for that table. A page that stores the LOB data isn't limited to holding data for one LOB column. LOB data pages can hold data from multiple rows and can even have a mix of text, ntext, and image data on one page.

Although developers who work with LOB data often think of the data as being stored in one string of bytes, the data storage is completely different. SQL Server 2000 and 7.0 organize the LOB pages in a B-tree structure so that operations starting in the middle of the string are more efficient than in SQL Server 6.5. In addition, the potential for wasting large amounts of space isn't so severe because many short data values can use space on the same page.

When deciding whether to use SQL Server LOB data types, developers often consider avoiding them completely and instead storing a character string specifying an OS file outside the database that contains the desired data. For example, if each column of your LOB data is a photograph in bitmap form, you might store the location of an external OS file in every row, rather than store the pointers to additional pages within the database. When Microsoft first introduced the LOB data types to SQL Server, accessing LOB data was inefficient. If you frequently accessed a large percentage of the LOB information, you'd usually have to use the external files to get good performance. However, SQL Server has greatly improved its access methods in the last several releases, so now the choice isn't as easy. Which method is faster depends on how big your LOB data is, how often you need to access it, and what you want to do with the data after you access it.

SQL Server manages the special LOB data pages internally just as it does all other data pages. SQL Server brings the pages into its cache when they're first accessed; if your applications frequently access the same LOB data, those pages will stay in the cache, giving you much faster subsequent access. If the LOB data pages aren't reaccessed, SQL Server writes them to the list of free pages to be replaced in the cache eventually by other pages. So if you access a small subset of your LOB data frequently, your best choice might be storing the data in the database and using text, ntext, or image columns. In some situations—such as when there are only a few such pointers, when you don't care about backing up the LOB data, or when you know you'll never be moving the database—storing your own pointers to files outside the database still might be better. However, in most cases, using LOB data types on their own special pages inside SQL Server gives you the best performance. To know which method is faster for your applications and data, you need to run your own comparison tests.

However, performance might not be your main consideration in choosing a storage method. You also need to consider ease of management. Storing the LOB data outside SQL Server gives you a large number of separate files to manage. If you need to move or copy your SQL Server installation, you must also move thousands (or more) of external files. In addition, your SQL Server backups won't contain the LOB data. Admittedly, the LOB data might not be as volatile as the other data in your database, so you might not need to back it up. However, if you don't include the LOB data in the database, a database restore after a complete system failure won't include the LOB files.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I needed a more decisive answer as I cant test on mass scale!

Anonymous User

Article Rating 1 out of 5

no mention of the maximum amount of inline data in a record (8000 bytes)?

Anonymous User