Using the JOIN condition D.orderyear = M.orderyear to join the derived table D with the Matrix table (aliased as M) gets the desired results. Each order in table D gets one matching row in the matrix table that represents the same order year as the one in table D. You group the result by custid as you did in the previous query and you again write an expression for each result column, but the expressions are much shorter this time. For example, to calculate the total quantity for order year 2002, you use the expression SUM(qty*y2002). Out of all the base rows that belong to a particular customer, only the rows that have the order year value 2002 return quantities; all others return NULLs-thus you get the correct total quantity for order year 2002. Here's the complete solution query:
SELECT custid,
SUM(qty*y2002) AS [2002],
SUM(qty*y2003) AS [2003],
SUM(qty*y2004) AS [2004]
FROM (SELECT custid, YEAR(orderdate) AS orderyear, qty FROM dbo.Orders) AS D
JOIN dbo.Matrix AS M ON D.orderyear = M.orderyear
GROUP BY custid
These techniques let you pivot data, but they aren't easy or obvious. Once you're familiar with the new PIVOT operator in SQL Server 2005, you'll find it's much simpler and more intuitive.
Pivoting Data in SQL Server 2005
Before you start writing the solution for pivoting in SQL Server 2005, run the code in Listing 1 in your SLQ Server 2005 server to create and populate the Orders table. The following SQL Server 2005 query uses the PIVOT operator to return the pivoted yearly order quantities for each customer:
SELECT *
FROM (SELECT custid, YEAR(orderdate) AS orderyear, qty FROM dbo.Orders) AS D
PIVOT(SUM(qty) FOR orderyear IN([2002],\[2003],\[2004])) AS P
The query's FROM clause contains the derived table D and the PIVOT operator, followed by parentheses in which you specify arguments for the PIVOT operation. You specify columns from D as arguments to indicate which source column holds the target column names and which column contains the values you want to aggregate. The PIVOT operator includes three parts: an aggregate operator, the source column, and a list of order year values. First, you specify the aggregate you want to calculate-in this case, SUM(qty). Second, the FOR clause indicates the source column name (orderyear) that contains the order years that will appear as result columns. And finally, the IN clause contains a list of values ([2002],\[2003],\[2004])-the order years that you want to turn into result columns.
An astute reader will notice that the custid column that appeared in the SQL Server 2000 GROUP BY clause is missing in the 2005 query. The PIVOT operator uses the columns that appear in the derived table D but not in the operator's arguments to figure out which column is the grouping column. In this case, the aggregate operator refers to qty and the FOR clause refers to orderyear. The only column that isn't specified in any of PIVOT's arguments is custid, so PIVOT uses custid as the grouping column. Honestly, I find the implicit grouping confusing and think it would have been clearer if the PIVOT operator required you to specify the grouping columns.
More to Come
Pivoting techniques rotate data from rows to columns and give you valuable flexibility when you analyze data. The new PIVOT operator in SQL Server 2005 uses a native T-SQL construct to let you pivot data using simpler and shorter code. Now that you're familiar with the basics of pivoting, you're ready to dive into the more advanced techniques that I'll discuss next month-dynamic pivoting and string concatenation.