DOWNLOAD THE CODE:
Download the Code 96784.zip

 Executive Summary:

There are many ways to concatenate row values in Microsoft SQL Server databases. With the hope of finding the most efficient row-concatenation solution, a SQL Server Magazine reader tested two common approaches: using a self-reference variable within a SELECT clause and using FOR XML. He found that although these solutions produce the same result, the time it takes to get those results varies significantly.

Let's say you have a table with a column named color_name and you want to store all the row values as a list of comma-delimited values. For example, if the column contains the row values of Yellow, Blue, and Red, the final result would look like: @cv = {Yellow,Blue,Red}

There are many ways to concatenate row values to get results like this. Although the various solutions produce the same result, the time it takes to get those results can vary significantly. With the hope of finding the most efficient row-concatenation solution, I tested two common approaches: using a self-reference variable within a SELECT clause and using FOR XML.

I began by running the SampleData.sql script in Listing 1 to create a table named T1 and populate it with 50,000 rows. (If you run this code so that you can test the solutions on your system, note that SampleData.sql takes a few minutes to run.)

Next, I turned on the Discard results after execution option in SQL Server Management Studio (SSMS) so that the time it took to generate the output wouldn't be taken into consideration. After enabling that option, I cleared the data cache and ran Solution1.sql, which Listing 2 shows, to concatenate rows by using a self-reference variable within a SELECT clause. It took 10 seconds for this code to run on my system.

It's worth noting that some people prefer to use the following two system-defined functions to concatenate rows:

DECLARE @cv varchar(MAX)
SELECT @cv =
COALESCE(@cv + ',', ') +
ISNULL(col1,'<NULL>') FROM T1
SELECT @cv

This code took even longer to run on my system (16 seconds) because of the functions' overhead.

I then tested Solution2.sql, which Listing 3 shows, to concatenate rows by using FOR XML. It took less than 1 second for this code to run on my system, which is 10 times faster than the first solution.

The first solution is not only slower but also leads to undefined results. This is due to the kind of assignment used within the SELECT clause. As Microsoft's Eugene Zabokritski notes, "The results of an assignment in the SELECT list... depend on what plan the optimizer chooses to execute the query, so technically speaking, the results are undefined." (For more information about undefined results, go to http://connect.microsoft.com/SQLServer/feedback/ViewFeed back.aspx?FeedbackID=126129.)

When you have a small number of row values, it probably doesn't matter which row-concatenation solution you choose. However, when you're working with tables that contain a large amount of data, consider using FOR XML to concatenate row values.

End of Article




You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

I have used the FOR XML PATH concatenation algorithm quite a bit, because I find it very flexible as a concat aggregate function with full control of ORDER BY, etc. However one very large caveat is that FOR XML uses the built-in XML parser and will not handle characters which need to be entitized such as apostrophe (') and ampersand (&). Therefore if using to concatenate string data, some kind of REPLACE function is needed to handle the standard XML entities. Still faster than the self-reference variable which doesn't work well with GROUP BY and can give surprising results with ORDER BY, so I still use it, but the extra code is ugly.

If string concatenation is a genuine need, I would strongly suggest using a CLR user-defined aggregation. Performance is comparable to the FOR XML PATH option with full control over GROUP BY and ORDER BY. I haven't figured out how to pass in an optional separator parameter though if you want a semicolon-delimited list instead of comma-delimited. That is very easy to do with the FOR XML PATH solution.

I would also recommend giving credit to Michael Rys for developing the FOR XML PATH algorithm originally (and perhaps Itzik Ben-Gan for popularizing it both in his T-SQL Black Belt column and his Inside SQL Server 2005 programming books.

TimOverlund

Article Rating 3 out of 5