Examples with Table Expressions
Now, I'll show a couple examples with table expressions. I'll use common table
expressions (CTEs), but you'll get the same plans if you instead use derived
tables or views. You'll work with the Orders table from the earlier examples.
The request is to calculate total quantity values per year and to compare each
year's total quantity with the previous year's.
Web Listing 7 shows a solution
using a CTE, and Web Figure 3
shows its execution plan. (Note that the plan will look a bit different if you
run the code on a multiCPU machine with parallelism enabled.) Observe in the
plan that the Orders table is scanned twice—once for the instance of
the CTE called Cur, and once for the instance of the CTE called Prv. As I mentioned
last month, the definition of the table expression is expanded in each reference.
After expansion, SQL Server ends up with a self-join between two instances of
the Orders table. The Orders table in our example has 1,000,000 rows; obviously,
the effect in cases of bigger tables is more dramatic.
As for the performance of this query, when the cache is cold (i.e., no pages
in cache after running DBCC DROPCLEANBUFFERS), all the data from the Orders
table will be physically read once, and logical reads will be doubled. In my
testing, the 1/0 cost amounts to 57,270 logical reads and 21,528 physical reads.
When the cache was hot (i.e., all pages in cache, ran the query twice and measured
the second run), I got 57,270 logical reads and 0 physical reads. Of course,
in production, you might face situations in which some of the pages are cached
and some aren't, so physical reads will vary. There are an additional 15 reads
against an internal work table that the plan uses (the internal work table is
represented by the spool operator in the plan). The subtree cost of the query
is about 58. In my test system, the query ran for 10 seconds against cold cache
and 2 seconds against hot cache.
At this point, there's potential for performance improvement by using a temporary
table or a table variable. You can eliminate one of the full scans of the Orders
table (clustered index scan). In this case, a table variable is more appropriate
because the resultset with the aggregates that needs to be materialized is tiny—only
four rows. Having distribution statistics doesn't play an important role. By
using a table variable, you minimize potential for recompilations.
Web Listing 8 has a solution
that uses a table variable. Web Figure
4 shows the execution plans I got for querying the Orders table to populate
the table variable, as well as querying the table variable to match current/previous
years.
The advantage is that SQL Server scans the Orders table only once; then, it
scans the tiny table variable twice for the join. When I ran the solution on
my test machine, I got the following performance statistics: 28,635 logical
reads; 21,528 physical reads against cold cache and 0 physical reads against
hot cache; a total subtree cost of 29; 9 seconds run time against cold cache,
and 1 second against hot cache. The performance advantage isn't dramatic when
running the code against cold cache—most of the run time is involved
with the physical scan of the data, which happens once in both cases. Still,
there is some advantage to the solution based on the table variable. With hot
cache, the advantage becomes much more apparent because the number of logical
reads (as well as the run time) is reduced by 50 percent.
In my final example, you need to access aggregated data only once. Suppose
you need to return orders with the highest quantity value per hour. Web
Listing 9 has the solution that relies on a CTE, and Web
Listing 10 has the solution that relies on a temporary table. Because you
need to access the aggregated data only once, there's no special benefit to
using a temporary table or a table variable. I compared the performance of both
solutions and found them to be similar. With the CTE solution, SQL Server stores
the aggregated data in a hash table. With the temporary table solution, I stored
the aggregated data in a temporary table explicitly. In short, very similar
activity takes place in both cases, hence the similar performance. The solution
based on a temporary table does have several disadvantages: It's a bit more
complicated, and because it involves data definition language (DDL) and data
changes associated with the temporary table, it might trigger recompilations
(of the hosting procedure). So, in this case, I'd probably use the CTE-based
solution.
The Value of Examples
This article's tangible examples help demonstrate a decision-making process
regarding the choices of temporary object types. There's much to consider, and
there are important—sometimes subtle—differences between the object
types. By understanding the characteristics of the different object types, along
with performance tests and analysis, you can make educated and informative decisions.