SQL Server 2005, formerly code-named Yukon, introduces a wealth of new T-SQL features and enhancements, which I briefly highlighted in my November 2003 article, "What's New in Yukon T-SQL," InstantDoc ID 40206. But in my opinion, common table expressions (CTEs) are by far the most important and exciting new SQL Server 2005 T-SQL feature. CTEs come in two forms: non-recursive and recursive. In this article, I delve into the details of CTEs. I first discuss non-recursive CTEs, then recursive CTEs in single-parent (tree) scenarios. After you've read the article, see the sidebar "Practice What You've Learned," page 26, to test your CTE skills.
Non-Recursive CTEs
A CTE is a named table expression that Microsoft implemented in SQL Server 2005 according to the ANSI SQL-99 standard. In its non-recursive form, a CTE is an alternative to derived tables, views, and inline user-defined functions (UDFs). A derived table is a named table expression that exists for the duration of a query. If you need to refer to a derived table multiple times in an outer query, you must define multiple derived tables by using the same query for each. This requirement can lead to lengthy code that's difficult to read and maintain and makes a derived table less advantageous than a view, as I discuss shortly. One advantage that derived tables have over views is that a derived table can refer to variables you defined in the same batch. Views are also named table expressions that persist in the database until you explicitly drop them. Unlike with derived tables, a query can refer to a view name multiple times; however, the view can't refer to variables. Inline UDFs have the same characteristics as views except that they can refer to the function's input arguments. CTEs have the best features of derived tables, views, and inline UDFsthey are named table expressions that exist only for the duration of the query, that an outer query can refer to multiple times, and that can refer to variables defined in the calling batch.
Let's jump right into an example that demonstrates all these advantages. Run Listing 1's code in SQL Server 2005's new AdventureWorks sample database to return current and previous yearly total sales values from the SalesOrderHeader and SalesOrderDetail tables. Listing 1's code first defines a variable called @CustID and sets its value to 3. The code then defines the CTE, using the WITH clause followed by the CTE name (YearlyCustOrdersCTE) and an optional result column list in parentheses (OrderYear, TotalValue). After the AS clause, the CTE's body contains a query that returns the order year and total sales value. The outer query following the CTE's body refers to the CTE name twicefor current and previous yearly salesby left-joining the instance of the CTE representing current yearly sales (Cur) to another instance representing previous yearly sales (Prev).
Sometimes in SQL Server 2005, the purpose of the WITH clause is ambiguous because you can use it for other purposes (e.g., to specify a table hint). So, you should suffix the statement preceding the CTE's WITH clause with a semicolon to specify where the preceding statement ends. In T-SQL, semicolons are required only as a suffix to a statement preceding a CTE's WITH clause. However, ANSI supports the use of a semicolon to specify where any statement ends, so suffixing all statements with a semicolon is good programming practice.
You can use one WITH statement to define multiple CTEs. Each CTE's query expression can refer to any CTE you defined before it in the same WITH statement. The outer query can also refer to all CTEs you defined. As an example of defining multiple CTEs, run the code in Listing 2 to return, for each order in the SalesOrderHeader table, the first and last dates of the order month.
Listing 2's first CTE (OrdersCTE) returns the sales order ID and order date from the SalesOrderHeader table. The second CTE (OrdersBOMCTE) calculates the beginning of the order month by referring to the first CTE. The third CTE (OrdersBOMEOMCTE) calculates the end of the order month by referring to the second CTE. The outer query simply returns all columns from the third CTE.
You can even modify data through CTEs. Before I show you an example of how to modify data, run the code that Listing 3 shows to create the SalesSummary table and populate it with sample data. The table contains current yearly sales quantities and values in the Qty and Value columns, respectively, but all new rows contain NULLs in the previous year's PrevQty and PrevValue columns.
Suppose you want to update the PrevQty and PrevValue columns by querying the SalesSummary table. If you want to stick to ANSI-compliant solutions, you have to use two subqueries, one for each column, which requires scanning the table twice. If you don't care about ANSI compliance, you can use T-SQL-specific syntax that allows joins in an UPDATE statement. With CTEs, you can have the best of both worldsthe ability to scan the table once and remain ANSI compliant. Run the code that Listing 4 shows to update the SalesSummary table by using a CTE called SalesCTE. This CTE performs a left outer join to match current yearly sales with previous year's sales. The outer UPDATE query modifies the previous-year columns with data from the current-year columns.
Prev. page  
[1]
2
3
next page