• subscribe
February 22, 2006 12:00 AM

Zip Your Data

Save space with these SQLCLR compression functions
SQL Server Pro
InstantDoc ID #49065
Downloads
49065.zip

Relational databases provide an organized repository for data and store most data types efficiently, but there are exceptions. For example, you can often greatly reduce the size of binary-type data by compressing it. And if you have large volumes of binary-type data, compression can substantially reduce your storage requirements.

SQL Server doesn't provide built-in compression facilities, but you can add compression capabilities to SQL Server 2005 by leveraging the Common Language Runtime (CLR). Let's look at how to compress data in SQL Server 2005 and how to take advantage of the new varbinary(max) data type, which replaces the often inflexible—and now deprecated—image data type. After working through this article's examples, you'll understand how to compress SQL Server data and how to leverage Windows .NET Framework code to extend the power of SQL Server 2005.

Zip It
For years, developers have had to create custom solutions to compress data before storing it in the database and decompress data after retrieving it. (For an overview of compression, see the sidebar "Compression Basics." The .NET Framework greatly simplifies these tasks. .NET Framework 1.1 provides the java.util.zip namespace, which offers methods for zipping and unzipping data.With this version of the Framework, for example, you can create wrapper classes to indirectly call java.util.zip from .NET languages such asVB.NET and C#. However, .NET Framework 2.0, which is part of SQL Server 2005 and Visual Studio (VS) 2005, features the System.IO.Compression namespace. By using the methods in this namespace, you can conveniently compress and decompress data.

The System.IO.Compression namespace, which is available to all .NET languages, offers two classes for compressing and decompressing data:DeflateStream and GZipStream. DeflateStream implements the DEFLATE algorithm (as defined in Request for Comment—RFC—1951). GZipStream implements the gzip format (as defined in RFC 1952),which compresses a single file and is also based on the DEFLATE algorithm. As the class names suggest, DeflateStream and GZip Stream work with streams of data. Streams aren't just sequences of bytes; they're objects, complete with methods for manipulating the objects.

Before looking at our sample compression programs, let's quickly walk through the key steps for creating a compression application. First, you need to add Visual Basic (VB) Imports or C# code, using statements for the System.IO and System.IO. Compression namespaces:

' VB Imports statements 
Imports System.IO _ 
   ' for Stream object 
Imports System.IO.Compression _
   ' for DeflateStream and _ 
  GZipStream 
  
// C# code 
using System.IO; 
 // for Stream object 
using System.IO.Compression; 
 // for DeflateStream and 
GZipStream 

(Note: Some code lines in text wrap to several lines because of space constraints.)

You also need to use the SqlBytes data type, a native SQL Server data type, as both a parameter and a function return value. SqlBytes is part of the new System.Data.SqlType name space, which provides classes for native data types within SQL Server 2005.According to the .NET documentation, these classes " provide a safer, faster alternative to the data types provided by the .NET [CLR]."As an added benefit, SqlType namespaces implement the INullable interface, which lets them contain an actual null value. Using native SQL Server data types also avoids type-conversion problems. Although you can use .NET common data types such as a Byte array within SQL CLR functions, the runtime will implicitly convert the .NET data types to a corresponding SqlType.Type conversions incur a slight performance penalty and open the door to possible conversion errors.For example,when we first wrote the CompressBytes function, we used Byte arrays. After we deployed the code to SQL Server 2005, we realized that the runtime was implicitly converting the Byte arrays to varbinary(8000).This unfortunate data-type conversion precluded working with documents larger than 8K.

To compress data, you need to create a managed-code user-defined function (UDF) of type SqlBytes:

Public Shared Function _ 
  CompressBytes(ByVal _ 
    UncompressedBytes As _ 
    SqlBytes) As SqlBytes 

You use both a MemoryStream object and a GZipStream object for the compression process, like this:

Dim outputStream As New _ 
  MemoryStream ' Contains the _ 
  compressed data 
Dim zipStream As Stream _
  ' The zip stream used for _ 
  compression 
zipStream = New GZipStream_ 
  (outputStream, _ 
  CompressionMode.Compress) _ 
  ' instantiate 

The function performs the compression by invoking the GZipStream object's Write method, which writes the compressed bytes into the MemoryStream object (output Stream). As you see in the following code snippet, the Write method requires the number of uncompressed bytes (we'll explain the significance of this in a moment):

zipStream.Write _ 
  (UncompressedBytes.Value, _ 
  0, CInt(UncompressedBytes. _ 
  Length)) 

With the compressed data now in output Stream, you just need to return the data from the function:

Return New SqlBytes _ 
  (outputStream.ToArray) 

Compressing data is that easy. And decompressing compressed data is almost as simple. You use the GZipStream's Read method to decompress data. As with the Write method, the Read method requires the number of compressed bytes that it will process. We found that keeping track of the number of compressed and uncompressed bytes was inconvenient when developing a generic compression and decompression wrapper class. We decided that adopting a chunk-oriented approach, in which we compress and uncompress a block of bytes at a time, was better because it abstracts the details of keeping track of the number of bytes.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here