In this series of articles, I'm discussing various classes of custom-aggregation solutions. SQL Server's built-in aggregate functions (e.g.,AVG, COUNT, MAX, MIN, SUM) are admittedly useful, but what if you need other types of aggregations that SQL Server doesn't provide? In the series' first two articles—"Custom Aggregations: The CursorTechnique" (http://www.sqlmag.com, March 2006, InstantDoc ID 49038) and "Custom Aggregations: The Pivoting Technique" (May 2006, InstantDoc ID 49675), I discussed cursor-based and pivot-based solutions.This month, I discuss solutions that utilize a new capability in SQL Server 2005: using .NET code to develop your own user-defined aggregates (UDAs).
As with the rest of the columns in this series, I'm using string concatenation as my example of a custom aggregation task. Here's a quick summary of the two sample concatenation tasks featured in the March and May columns:The first task is to use the Northwind database to return each customer and, for each, a concatenated list of order IDs expressed as a comma-separated list of values and sorted by order ID, as Table 1 shows.The second task is to aggregate distinct, concatenated employee IDs for each customer, as Table 2 shows.
Remember that we're evaluating the usefulness of each solution based on several factors: Is the solution generic? Does the solution support an unlimited number of elements in a group, and does it have any special limitations? Can you apply the aggregation to the elements in a desired order? How well does it perform? Finally, is the solution set-based or is it iterative/procedural? Let's first examine custom aggregate solutions based on UDAs, then evaluate the solution based on the aforementioned criteria, as I did with the cursor-based and pivot-based solutions.
UDA-Based Custom Aggregates
SQL Server 2005 introduces .NET integration. You can now develop various components in SQL Server, based on the Common Language Runtime (CLR). These include UDAs, user-defined types (UDTs), user-defined functions (UDFs), stored procedures, and triggers.The focus of this article is UDAs. Note that you can develop UDAs only with the CLR; SQL Server doesn't support the development of UDAs with TSQL code. However, after you deploy the UDAs in SQL Server, you can use them just like any other built-in aggregate function in T-SQL.
You can use any development environment you want to develop UDA code—even Notepad. However, using Visual Studio 2005 Professional Edition or later is more convenient because it's a rich development environment that lets you debug your code. This product also lets you easily deploy your code in SQL Server. In this article, I assume that you'll use Visual Studio 2005 Professional Edition or later. In doing so, I'll point out the steps that differ in the development and deployment process if you're using an edition of Visual Studio 2005 other than the Professional Edition (e.g., Standard).
Listing 1 contains the definition of the StringConcat function in C# code. In Web Listing 1 (http://www.sqlmag.com, InstantDoc ID 49983), you'll find the definition of the function in Visual Basic code.The code in both functions is similar; the only difference is the language of choice.As I describe the functions' logic, you can examine either of the listings, based on your language of preference.
First, I'll provide a brief description of the code, and second, I'll provide instructions for how to deploy and use the function in SQL Server.The code starts with declarations of namespaces (i.e., with the using statements). The purpose of these declarations is merely to allow for shorter naming of elements later in the code. Following the namespaces declaration, you'll find attribute definitions. Attributes help Visual Studio in the deployment process and help SQL Server optimize your code.
The Serializable attribute, which is mandatory for UDAs, means that the class field values will be serialized (persisted); thus, SQL Server will be able to access intermediate results. If you're using exclusively .NET value types, you can let SQL Server implement its own serialization (i.e., native serialization). However, if you use reference types (e.g., the string type, as in our example), you'll have to define your own serialization (i.e., user-defined serialization). To do so, you implement the IBinarySerialize interface, providing your own definitions of the interface's Read and Write methods, as I'll describe later. The SqlUserDefinedAggregate attribute defines various UDA properties, including the following:
- Format.UserDefined—the function uses user-defined serializatiom
- IsInvariantToDuplicates—the false value means that duplicates matter
- IsInvariantToNulls—the true value means the code doesn't care about NULLs
- IsInvariantToOrder—the false value means that order matters; SQL Server 2005 currently doesn't let you control the order of the computation and therefore ignores this attribute
- IsNullIfEmpty—the false value tells the code not to yield NULL if the input is an empty set of values
- MaxByteSize—the 8000 value sets the maximum size in bytes to 8000
Following the attribute definitions is the header of the structure (or class), which defines the UDA. This sample code names the structure StringConcat and specifies that the structure implement the IBinarySerialize interface, for the aforementioned reasons.
After the structure's header is its body. It starts with a definition of the sb variable (i.e., the StringBuilder object). This variable will contain the intermediate value of the concatenated string.The rest of the code in the structure's body implements different methods, including the following:
- Init—This method initializes the computation and is invoked once per group that the query processor aggregates. In our case, the Init method initializes the variable sb with a new StringBuilder object, which in turn is initialized with an empty string.
- Accumulate—The query processor uses this method to accumulate the aggregate values.The method is invoked once per value in the group that's being aggregated. In the case of this code, the method ignores (skips) NULL values, and appends known values to sb.
- Merge—SQL Server might decide to perform multiple partial aggregations; each partial aggregation will get a separate subset of the input set of values from the group.This method merges another instance of the aggregate value with the current aggregate value.You use it to merge multiple partial computations of an aggregation. In this example, the Merge method simply appends the input string to the current string.
- Terminate—This method completes the computation and returns the result of the aggregation. In this example, it returns a new SqlString object, which is initialized with the final concatenated string (sb.ToString()).
- Read and Write—Remember that you need to define these methods because our function uses user-defined serialization and must therefore implement the IBinarySerialize interface.The implementation of the methods is simple in this example.The Read method returns a StringBuilder object that is initialized with the ReadString method (applied to the input value r).The Write method truncates the string to 8000 bytes (4000 characters) in case it requires truncation. Unfortunately, UDAs don't support strings larger than 8000 bytes.
The first four of the preceding methods—Init, Accumulate, Merge, and Terminate—are mandatory for all UDAs. The last two—Read and Write—are required when using user-defined serialization, as in the case of our example.
Prev. page  
[1]
2
next page