Pivoting data is a technique that rotates data from a state of rows to a state of columns, possibly aggregating multiple source values into the same target row and column intersection. Some practical reasons to pivot data include formatting data for reporting purposes, calculating custom aggregations, and solving relational division problems. We’ve covered pivoting in the past (see the Learning Path for SQL Server Magazine articles about pivoting)—but based on the number of pivoting questions I continue to receive, I think a refresher is in order. Even experienced T-SQL developers have a hard time getting used to the syntax of the native PIVOT and UNPIVOT operators that were introduced in SQL Server 2005; many developers resort to using more traditional syntax even though it tends to be much more verbose.
In this article I discuss standard traditional techniques for pivoting, as well as nonstandard techniques that rely on the newer native PIVOT operator. I also explain the advantages and disadvantages of the PIVOT operator. Finally, I discuss common challenges associated with pivot queries and their solutions.
For sample data I use a database called InsideTSQL2008. You can download the source code to create the sample data from www.InsideTSQL.com/books/source_code/InsideTSQL2008.zip.
Pivoting Phases and a Standard Solution
My first example to demonstrate pivoting involves shipping information from the Sales.Orders table for orders shipped in Spain. The following query returns the source data that you’ll rely on:
USE InsideTSQL2008;
SELECT shipperid, shipcity, freight
FROM Sales.Orders
WHERE shipcountry = N'Spain'
ORDER BY shipperid, shipcity;
Table 1 shows this query’s output.
Suppose that you get a task to return a row for each shipper ID, a column for each shipping city in Spain, and the total freight values for each shipper and city intersection. Table 2 shows the desired result. This task represents a classic pivoting need and is therefore a good example to demonstrate the concept and solutions.
If you want to break a pivoting task into multiple conceptual phases, you can identify three main phases: grouping, spreading, and aggregating. Figure 1 illustrates these phases.
From a user perspective, pivoting involves three main elements: the element that you need on rows (shipperid in our case), the element that you need on columns (shipcity in our case), and the element that you need in the data section, or the intersection of rows and columns (sum of freight in our case).
The first phase illustrated in Figure 1 is the grouping phase. You need to arrange one row in the result for each distinct “on rows” element. In SQL terms, this is called grouping; the element that you need to group by (the on rows element) is shipperid. In the standard traditional solution to pivoting, the grouping phase is implemented simply with a GROUP BY clause, like so:
GROUP BY shipperid
The second phase in pivoting is the spreading phase. You need to spread the values that will later be aggregated (freight in our case) to target columns based on the distinct values that appear in the “on cols” element (shipcity in our case). The distinct ship city values that appear in our sample data for the ship country Spain are Barcelona, Madrid, and Sevilla. In the traditional solution to pivoting, you can achieve the spreading logic using CASE expressions, like so:
CASE WHEN shipcity = N'Barcelona' THEN freight END AS Barcelona,
CASE WHEN shipcity = N'Madrid' THEN freight END AS Madrid,
CASE WHEN shipcity = N'Sevilla' THEN freight END AS Sevilla
Note that when an ELSE clause isn’t specified in a CASE expression, an ELSE NULL is implied. Because that’s exactly the behavior we need, you can simply omit this part, as I did. Also note that in order to use a solution that’s based on a static query, you have to know ahead of time what the distinct values are in the spreading element, or at least what the maximum number of distinct values is. Later in the article I cover a solution that relies on string concatenation techniques and dynamic SQL to cope with situations in which this information isn’t available.
The third and last phase in pivoting is the aggregating phase. In this phase you aggregate the data values in each target pivoted column. In our example, the aggregate function is SUM and the aggregation element is freight. In the traditional solution to pivoting, you simply apply the aggregate function to the results of the CASE expressions, like so:
SUM(CASE WHEN shipcity = N'Barcelona' THEN freight END) AS Barcelona,
SUM(CASE WHEN shipcity = N'Madrid' THEN freight END) AS Madrid,
SUM(CASE WHEN shipcity = N'Sevilla' THEN freight END) AS Sevilla
Here’s the complete solution implementing all three phases:
SELECT shipperid,
SUM(CASE WHEN shipcity = N'Barcelona' THEN freight END) AS Barcelona,
SUM(CASE WHEN shipcity = N'Madrid' THEN freight END) AS Madrid,
SUM(CASE WHEN shipcity = N'Sevilla' THEN freight END) AS Sevilla
FROM Sales.Orders
WHERE shipcountry = N'Spain'
GROUP BY shipperid;
You can consider this solution as a template. For any pivoting task you get, identify the three elements involved and put them in the right places in this template. To recap, in our example those elements are:
- Grouping: shipperid
- Spreading: shipcity IN (Barcelona, Madrid, Sevilla)
- Aggregation: SUM(freight)