Manipulating binary large objects (BLOBs) and character large objects (CLOBs)
has always been difficult in SQL Server. Fortunately, SQL Server 2005 provides
three new data types for large object (LOB) storage. With these new data types,
you can easily manipulate BLOBs and CLOBs, especially when you use Common Language
Runtime (CLR) procedures and functions. With the compression algorithms in Microsoft
.NET Framework 2.0, you can create CLR functions that seamlessly compress and
decompress LOBs with minimal performance impact. Before I show you how to do
so, let me introduce you to the three new data types that make this compression
and decompression possible.
The New Kids on the Block
SQL Server 2005 provides three new data types to store and manipulate LOBs:
- varbinary(max), which you use for BLOBs
- varchar(max), which you use for CLOBs
- nvarchar(max), which you use for CLOBS with Unicode values
The new varbinary(max), nvarchar(max), and
varchar(max) data types behave much like
the traditional varbinary(n), nvarchar(n), and
varchar(n) data types, respectively. However,
the maximum storage size of the three new
data types is 2^31-1 bytes, or about 2GB.
The new data types replace the text and
image data types from previous versions of
SQL Server. SQL Server 2005 Books Online
(BOL) states that you shouldn't use the text
and image data types in new SQL Server
2005 applications and you should change
any existing text and image data types to the
new data types in legacy applications.
Unlike text and image values,
varbinary(max), nvarchar(max), and
varchar(max) values can be used for variables
and function parameters. They can also be
returned by CLR (or T-SQL) scalar-value
functions. These traits make them great
candidates for data manipulation.
CLR procedures and functions use the .NET data types of SqlBytes and SqlChars
to receive and process BLOBs and CLOBs, respectively. SqlBytes is used to pass
varbinary(max) parameters, where as SqlChars is used to pass nvarchar(max) and
varchar(max) parameters. CLR is Unicode based, so passing varchar(max) as a
parameter implies converting it to Unicode before parameters are passed. SQL
Server 2005 does all this without your intervention, but it's important that
you're aware of the differences concerning Unicode and nonUnicode strings.
Compressing LOBs
Before SQL Server 2005, you had to use extended stored procedures to compress
data, which is a difficult and risky business. SQL Server 2005's CLR integration
capabilities make compression less complicated, more secure, and more stable.
Because you can convert data back and
forth between BLOBs and CLOBs, let's
explore how to use varbinary(max) for simplicity sake. Listing 1 contains a function, written
in C#, to compress BLOBs. Compressing a
BLOB is as easy as passing in a SQLBytes
parameter, reading the BLOB's data, and
writing that data to a compression stream.
Callout A in Listing 1 shows where the compression
function receives a varbinary(max) value as a SQLBytes parameter. SQLBytes represents
a mutable type that wraps either an array or a stream. Assuming it wraps an
array, the code uses the Buffer property to access the underlying array and
retrieve the BLOB's data, as callout B shows. Note that for large BLOBs, this
method could raise an out-of-memory exception if you're running SQL Server 2005
Service Pack 1 (SP1) or earlier. Fortunately, unlike extended-procedure exceptions,
CLR exceptions don't crash SQL Server 2005 machines. This size limitation has
been resolved in SQL Server 2005 SP2.
Finally, the compression function uses the DeflateStream class in .NET Framework
2.0 to compress the BLOB, as callout C shows. The compression stream writes
to a MemoryStream object that's later used to create a new SQLBytes object that's
returned to the calling process. There is only one caveat: Microsoft's implementation
of DeflateStream requires the stream to be closed before it writes the last
compressed bytes. Flushing alone isn't enough. So, the function flushes, then
closes the compression stream.
Decompressing LOBs
Compressing data makes LOBs unreadable to other applications, so you need a
function to decompress the data. Listing 2
contains a BLOB decompression function that's written in C#. This function follows
the same principles used in the compression function, but instead of writing
data to a stream and returning a compressed block, it reads from a stream and
returns a decompressed block.
As callout A in Listing 2 shows,
the code uses a loop to read data from the DeflateStream object. This loop is
necessary because the deflator doesn't know how much data it needs to read or
how much compression was achieved. The code loops until all bytes are read,
then writes the bytes read in each pass to a MemoryStream object.
Prev. page  
[1]
2
next page