• subscribe
May 23, 2006 12:00 AM

Custom Aggregations: User-Defined Aggregates

Evaluating a new capability in SQL Server 2005
SQL Server Pro
InstantDoc ID #49983
Downloads
49983.zip

Putting the UDA to Use
So far, I've briefly described the code for defining the UDA. Next, I provide you with the instructions for deploying and using the function. First, run the following code in SQL Server to create a test database calledTestUDA, in which you'll later deploy the function:

USE master; 
GO 
IF DB_ID('TestUDA') IS NOT NULL 
  DROP DATABASE TestUDA; 
GO 
CREATE DATABASE TestUDA; 
GO 
USE TestUDA; 

You implement the next steps through Visual Studio 2005.

  1. Assuming you're using Visual Studio 2005 Professional Edition or later, create a new C# (or Visual Basic) project, using the Database, SQL Server Project template. Note that if you're using a version of Visual Studio other than the Professional Edition (e.g., Standard), you'll need to use the Class Library project template.
  2. In the New Project dialog box, name the project ConcatString, specify C:\CLR as the target location (or another folder of your choice), and confirm.
  3. If you used the SQL Server Project template, the Add Database Reference dialog box will appear. Create a new database reference (by clicking Add New Reference) to the TestUDA database, and choose it. If a dialog box asking whether you want to enable SQL/CLR debugging appears, click No. (I won't be discussing debugging in this article.)
  4. Add an Aggregate item (by clicking Project, Add Aggregate), and name it StringConcat.cs if you used C# code or StringConcat.vb if you used Visual Basic code. If you used the Class Library project template, simply rename the file Class1.cs (or Class1.vb) to StringConcat.cs (or StringConcat.vb).
  5. Overwrite the new file's code with the code from Listing 1 or Web Listing 1 (depending on your language of choice).
  6. Deploy the solution to SQL Server by choosing Build, Deploy ConcatString. If you used the Class Library project template, you'll need to first build the solution (through the Build, Build StringConcat menu item), then use the appropriate TSQL code (i.e., the CREATE ASSEMBLY and CREATE AGGREGATE commands) to deploy it manually in SQL Server.

The deployment process creates an assembly in a .dll file on disk, loads the intermediate language (IL) code from the file to the Test-UDA database, and registers the StringConcat function in the database.The .dll file is no longer necessary because the IL code is now an integral part of the database. At this point, the function is ready for use. Now, you can switch to SQL Server and implement the solution.

Remember that you use a UDA just as you would use any built-in aggregate function, so I don't really have much to show you.You can simply start using it.As a simple example, the following query returns a concatenated list of customer IDs for each employee, using a semicolon as a separator:

USE Northwind; 
SELECT EmployeeID, 
   TestUDA.dbo.StringConcat 
   (CustomerID + N';') AS ArrCusts 
FROM dbo.Orders 
GROUP BY EmployeeID; 

Table 3 shows the code's output.The code uses the fully qualified function name Test-UDA.dbo.StringConcat because the function was created in TestUDA; the database context of this code is Northwind.

Note that if an employee has handled more than one order for a customer, the customer ID will appear more than once in the employee's result string. If you want to concatenate only distinct customer IDs for each employee, use StringConcat( DISTINCT CustomerID + N';'), as you would with built-in aggregate functions. Also, remember that UDAs don't let you control the order of computation; therefore, you can't guarantee the order of concatenation.

Going back to the two tasks that I presented earlier:The first task is to concatenate all order IDs per customer, separated by commas, and present the output that Table 1 shows. Because OrderID is an integer, you'll need to convert it to a character string.Also, you don't want a leading or terminating comma, so you can add a comma in front of every OrderID, and use the STUFF function to get rid of the first comma, as follows:

SELECT CustomerID, 
   STUFF( 
TestUDA.dbo.StringConcat( 
   N',' + CAST(OrderID AS 
      NVARCHAR(10))), 
1, 1, N') AS ArrOrders 
FROM dbo.Orders 
GROUP BY CustomerID; 

The second task is to concatenate the distinct employee IDs per customer, separated by commas, and produce the output that Table 2 shows.You use a similar solution to the previous one, with the addition of the DISTINCT keyword in the aggregate function to take only distinct employee IDs into consideration:

SELECT CustomerID, 
  STUFF( 
  TestUDA.dbo.StringConcat(
   DISTINCT N',' + CAST(EmployeeID 
   AS NVARCHAR(10))),
  1, 1, N') AS ArrEmps 
FROM dbo.Orders 
GROUP BY CustomerID; 

Evaluating the UDA-Based Solution
Finally, to evaluate the UDA-based solution, let's go over the factors that I'm using to evaluate all custom aggregate solutions. One advantage of this solution is that it's generic. You can treat it as a template and use similar code when you need to implement other custom aggregates.The solution is fast because it doesn't involve a lot of I/O; the source data is scanned only once. And even though the UDA code is developed with a .NET language, you use it like any other aggregate function in a simple, set-based manner.

A disadvantage of this solution is that UDAs are limited to returning character strings of up to 8000 bytes long. Consequentially,-your solution is limited in terms of the number of elements that can be concatenated. Still, 8000 bytes might be sufficient in many cases, and it's not as limited as we saw with the pivoting techniques that I discussed last month. Another disadvantage is that you can't apply the aggregation to the elements in a desired order. Even though UDAs have an attribute called IsInvariantToOrder, this attribute is currently ignored.Also, implementing UDAs requires that you gain a new skill-set: developing in .NET.

Join me next month for the conclusion of this four-part series. I'll discuss specialized solutions, the fourth and final class of solutions to custom aggregates. Meanwhile, I'd like to thank SQL Server MVP Dejan Sarka for his contributions to this article. The CLR code herein is based on Dejan's code, with some revisions.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here