• subscribe
January 25, 2011 08:57 AM

Unpivoting Data

Three techniques based on UNPIVOT, cross join, and APPLY
SQL Server Pro
InstantDoc ID #129017
Downloads
Ben-Gan SQL1155.zip

Pivoting and unpivoting data in SQL Server are very common tasks involving data rotation. In the case of pivoting, you need to rotate data from a state of rows to a state of columns. In the case of unpivoting, you need to rotate data from a state of columns to a state of rows. Last month I covered pivoting techniques; this month I focus on unpivoting techniques.

Sample Data and Desired Results

The task of unpivoting data involves working with input data that appears in pivoted form; the purpose of unpivoting the data is to spread, or split, multiple column values from the same source row into multiple target rows. For each set of columns that you’re unpivoting, you typically want to generate two result columns: one to hold the source column names (call it the names column), and one to hold the source column values (call it the values column). If this explanation doesn’t seem to make much sense for now, that’s understandable—the process of unpivoting is best understood through an example, which is coming shortly. First I’ll describe the sample data, then the desired output, and then I’ll cover three techniques to achieve the desired results.

For sample data you’ll use a database called InsideTSQL2008. You can download the source code to create the sample data. After running the script to create the database InsideTSQL2008, run the code in Listing 1 to create and populate the table PvtSample, which you’ll use as the input data for the unpivoting tasks described in the article.

Table 1 contains the output of the query at the end of Listing 1, showing the contents of the PvtSample table. As you can see, the table has a row for each shipper, with the shipper ID, a set of three columns with total freight values for each shipping city in Spain (frtBarcelona, frtMadrid, frtSevilla), and a set of three columns with total quantities for each shipping city in Spain (qtyBarcelona, qtyMadrid, qtySevilla).

In this article I discuss unpivoting a single set of columns, as well as unpivoting multiple sets of columns. To unpivot a single set of columns, you need to focus on the source columns shipperid, frtBarcelona, frtMadrid, and frtSevilla and ignore all the qty columns (qtyBarcelona, qtyMadrid, and qtySevilla). The task is to unpivot the data such that each source row will be unpivotted into three target rows—one for each shipping city. Each target row should hold the shipper ID, shipping city, and freight. Inapplicable cases shouldn’t be returned in the result; that is, cases in which the source value was NULL, as with shipper 3 in Barcelona. Table 2 contains the desired output for the first unpivoting task.

The second task involves unpivoting multiple sets of columns. Like in the first unpivoting task, you need to unpivot each source row into three target rows—one for each shipping city. But in the second unpivoting task, you should produce two target value columns—one for freights (from the source columns frtBarcelona, frtMadrid, and frtSevilla), and another for quantities (from the source columns qtyBarcelona, qtyMadrid, and qtySevilla). Table 3 contains the desired output for the second unpivoting task.

UNPIVOT Operator

SQL Server 2005 introduced the UNPIVOT operator to address unpivoting needs. This operator was implemented as a table operator, much like the join table operator was. This means that you use the UNPIVOT operator in the FROM clause of your query. It operates on the table specified to the left of the UNPIVOT keyword and returns a table result that you have to assign with an alias. The general syntax of a query using the UNPIVOT operator is

 

SELECT <select_list>
FROM <input_table> UNPIVOT(<unpivot_specification>) AS <alias>;

 

Just like with other table operators, the input to the UNPIVOT operator can be the result of preceding table operators, and the output of the UNPIVOT operator can be used as input by a subsequent table operator or another logical query processing phase (e.g., the WHERE filtering phase).

Some database administrators and developers have a hard time getting used to the syntax of the UNPIVOT operator. I’ll try to explain its syntax such that it will be straightforward and easy to remember, and I’ll do so using the first pivoting task described earlier (unpivoting freights for each shipping city). If you think about it, unpivoting involves rotating a set of source columns into two target columns—those that I referred to earlier as the values column and the names column. So in every unpivoting problem you need to identify three things:

1.     Set of source columns to be unpivotted (in our case frtBarcelona, frtMadrid, and frtSevilla)

2.     Name you want to assign to the target values column (freight in our case)

3.     Name you want to assign to the target names column (shipcity in our case)

 

After you identify these three elements, it’s just a matter of sticking them in the right places as part of the UNPIVOT operator’s specification. The specification of the UNPIVOT operator is

UNPIVOT( <values_column> FOR <names_column> IN (<columns_for_unpivoting>) ) AS <alias>

So in our case, to address the first task you’d use the following specification:

UNPIVOT( freight FOR shipcity IN (frtBarcelona, frtMadrid, frtSevilla) ) AS U

The values of the source columns will be stored in the values column (freight), and the names of the source columns will be stored in the names column (shipcity) as character strings. Listing 2 contains the complete solution query to address the first task, producing the desired output shown earlier in Table 2.

Because the names of the source columns that hold the freights for the different shipping cities all start with the prefix frt followed by the city name, the code applies string manipulation in the SELECT list to the target names column to remove this prefix and keep only the city names.

All in all, the syntax of the UNPIVOT operator is very short and elegant, as long as you have fairly basic needs—that is, assuming that you need to unpivot only one set of source columns, all unpivotted columns have exactly the same type (including size), and you want to remove all rows with NULLs in the values column from the output. Also, when unpivoting one set of source columns, SQL Server processes the UNPIVOT operator quite efficiently, scanning the input table only once.

But things get trickier when the unpivoting task goes beyond the basic needs. For example, if the source columns to be unpivotted aren’t exactly the same type—even if they differ only in size, precision, scale, etc.—your code will fail. The workaround is to define a table expression in which you first cast all source column types to a common type, then provide that table expression as input to the UNPIVOT operator.

You’ll face another obstacle if you don’t want to remove rows with NULLs in the values column. The UNPIVOT operator doesn’t really leave this as an option but instead makes the removal of the rows with the NULLs as a mandatory phase. The workaround if you want to keep those rows is a bit awkward—you need to define a table expression in which you use the COALESCE function to substitute all NULLs with a non-NULL value that can’t originally appear in the data, and provide that table expression as input to the UNPIVOT operator. Then in the outer query you need to use the NULLIF function to substitute the non-NULL values back with NULLs.

You also need to straggle a bit in case you need to unpivot multiple sets of columns, as is the case with the second task. As a reminder, you need to unpivot both freights and quantities for each shipping city. The two sets of columns you need to unpivot are {frtBarcelona, frtMadrid, frtSevilla} for freights and {qtyBarcelona, qtyMadrid, qtySevilla} for quantities. The UNPIVOT operator doesn’t support such a scenario directly. A workaround is to write two UNPIVOT queries, each defining a different table expression, then have an outer query join the two, as Listing 3 shows, producing the output in Table 3.

If you’re wondering why the outer query uses a full outer join and the COALESCE function, this has to do with the possibility of having NULLs in the value columns, as well as the fact that each of the individual UNPIVOT queries will eliminate those rows with NULLs.

As you can see, in the case of multiple sets of source columns to be unpivotted, the query gets complicated. Also, if you examine the execution plan for the code in Listing 3, you’ll see that the unpivoting work is pretty much repeated. The input data is scanned twice, and there’s also the cost of the join. This plan is quite inefficient.

Other techniques exist for achieving unpivoting without the use of the UNPIVOT operator and without repeating the work. You’ll probably find them more convenient to work with, especially when the task is beyond the basics. First I’ll present a technique that uses a cross join, then a technique that uses the APPLY operator.

 



ARTICLE TOOLS

Comments
  • wirama
    1 year ago
    Feb 23, 2011

    for sql 2000 or below, only CROSS JOIN is the solution.
    but for sql 2005 or above, APPLY operator performs better than UNPIVOT given there are multiple sets of names/values columns. It really enlightens my TSQL knowledge toolbox.

    I only can say this is really awesome article! Keep up a good work, Itzik.

You must log on before posting a comment.

Are you a new visitor? Register Here