Data integrity, data consistency, and avoiding anomalies are the most important considerations when you store data in your database. However, when you query data, you usually think about your application's users and the information that you provide them. You often want to let them look at the data in a different format than the raw form in which it's stored. When you query data for analysis purposes, you typically want to produce either a small result set that fits on a screen or an intermediate result set that can be more easily manipulated by another query than your source data can. For these purposes, you might want to pivot your data. When you pivot data, you rotate rows to columns or columns to rows. That is, you can rotate data from multiple rows to multiple columns in a single row or unpivot data from multiple columns of a single row into result rows. You can use the pivot technique to rotate data in columns and rows for analysis purposes. Many analysis tools provide pivot capabilities, such as Microsoft Excel's PivotTable, Microsoft Access's Transform command, and the Pivot Table ActiveX Control. However, if you want to pivot or unpivot data in SQL Server 2000, you have to write complex T-SQL expressions. SQL Server 2005 provides native T-SQL PIVOT and UNPIVOT operators that simplify these operations. In this article, I discuss some techniques for pivoting data in SQL Server 2000 and introduce you to SQL Server 2005's PIVOT operator.
Pivoting Data in SQL Server 2000
To demonstrate pivoting techniques, let's use the Orders table that Listing 1 creates and populates. Say you want to analyze order quantities by customer and order year. You can group the data according to custid and YEAR(orderdate), then return the SUM(qty) value, but the total quantity for each combination of customer and order year will appear in a separate result row. For example, customer A ordered in 2002, 2003, and 2004. Hence, customer A has three rows in the result set-one row for each year. Assume that for each customer, you want one result row that has one column for each order year-2002, 2003, and 2004-and total quantities for each year in separate columns, as Figure 1 shows. Writing a query that produces these results in SQL Server 2000 requires some T-SQL acrobatics. But I can show you some techniques that will solve the problem. The first solution uses an aggregate query that contains CASE expressions:
SELECT custid,
SUM(CASE WHEN orderyear = 2002 THEN qty END) AS [2002],
SUM(CASE WHEN orderyear = 2003 THEN qty END) AS [2003],
SUM(CASE WHEN orderyear = 2004 THEN qty END) AS [2004]
FROM (SELECT custid, YEAR(orderdate) AS orderyear, qty FROM dbo.Orders) AS D
GROUP BY custid
This query defines the derived table D and extracts from the Orders table only the columns of interest for the pivot operation (custid, orderyear, and qty). To return one row for each customer, the query groups the data by custid. The SELECT list performs the magic of extracting quantities from multiple source rows and creating multiple result columns within a single row. The query contains a CASE expression for each order year. This example involves three order years: 2002, 2003, and 2004. The CASE expressions return the order quantity only if the base row contains the order year that the expression contains. Otherwise, the CASE expression returns NULL because you didn't specify an ELSE clause. You can explicitly specify ELSE NULL if you want, but because you want to get a NULL when the CASE doesn't find a match, you might prefer to use the shorter syntax. Then the SUM() function summarizes all the order quantities and ignores the NULLs. In the query, the first SUM(CASE) expression summarizes the order quantities for order year 2002, the second expression summarizes for 2003, and the third expression summarizes for 2004. So you get all the yearly order quantities for a particular customer in one result row.
This example uses only 3 years, so the query is short. A larger number of years will make the query much longer. However, you can shorten the query by applying a different pivoting technique-create a matrix table that contains a row and a column for each order year. A matrix table is a helper table that contains a row for each source attribute that you're about to rotate and a column for each target attribute that you're about to generate. In this example, we're rotating source year rows into target year columns so you have three rows (for the years 2002, 2003, and 2004) and three columns (for the same years). You get nine row and column intersections in a three-by-three table. The matrix table will help you isolate the column values that you want to aggregate into the different result columns. Listing 2 creates and populates the Matrix table with three rows for the order years 2002, 2003, and 2004. The Matrix table has an orderyear column and a column that uses the naming format yorderyear for each order year that you want to handle. In our example, the table contains the columns orderyear, y2002, y2003, and y2004. The table has a row for each year-three rows in this case-and stores the order year in the orderyear column, a 1 in the column whose value is the same year as its name, and NULLs in all other columns. For example, the row for order year 2002 has 2002 in the orderyear column, a 1 in the column y2002, NULL in the column y2003 and NULL in the column y2004. Issuing a SELECT * statement against the Matrix table produces the results that Figure 2 shows.
Prev. page  
[1]
2
next page