Column Cardinality
Another problem in which I found indexed views useful is calculating column cardinalitythe number of distinct values in a column. Calculating the cardinality of each column in a table can be costly because SQL Server needs to rearrange each column's values so that it can easily count the number of distinct values. Let's look at some examples that use a table with three data columns and 1 million rows. Run the script that Listing 5 shows to create a table called T1 in tempdb that has columns called c1, c2, and c3, which contain 50,000, 100,000, and 200,000 distinct values, respectively.
With no suitable indexes, the following query, which calculates the cardinality of each data column, would give very poor performance:
SET STATISTICS TIME ON
SET STATISTICS IO ON
SELECT COUNT(DISTINCT c1),
COUNT(DISTINCT c2), COUNT(DISTINCT c3) FROM T1
Look at this query's execution plan in Query Analyzer and notice that SQL Server scans T1's clustered index three times and hashes the values in each column to count the number of distinct values. As a result, the query takes a long time to finish and incurs a lot of I/O overhead. On my laptop, the query ran for 23 seconds and incurred 9294 logical reads.
The obvious first step toward better performance is to index each column:
CREATE NONCLUSTERED INDEX idx_nc_c1 ON T1(c1)
CREATE NONCLUSTERED INDEX idx_nc_c2 ON T1(c2)
CREATE NONCLUSTERED INDEX idx_nc_c3 ON T1(c3)
This way, SQL Server can scan each index separately in an ordered fashion and count the distinct occurrences. Now the query runs for 5 seconds, incurring 4083 logical reads.
You might be satisfied with this improvement, but with tables that have more columns and more rows, you might still need improvement. One way to get better query performance (albeit at the cost of modification performance) is to write INSERT, UPDATE, and DELETE triggers that maintain a distinct count value for each column in a separate, denormalized table. Or, you can handle the problem without using triggers by using indexed views instead of the regular nonclustered indexes.
For each column, you can create an indexed view that contains only the column of interest and the COUNT_BIG(*) function, which is required for indexing views that have aggregations. For example, create the following indexed view for c1:
CREATE VIEW V_T1_c1 WITH SCHEMABINDING
AS
SELECT c1, COUNT_BIG(*)
AS cnt FROM dbo.T1 GROUP BY c1
GO
CREATE UNIQUE CLUSTERED INDEX
idx_uc_c1 ON V_T1_c1(c1)
Then, run the script that Listing 6 shows to create the indexed views for c2 and c3. Now, rerun the query, measuring duration and I/O and examining the execution plan. The optimizer is smart enough to understand that, to calculate the distinct count of a column, it can simply count the number of rows inside the indexed view that uses the column in its GROUP BY clause. The query now takes less than a second to run and incurs fewer than 1000 logical reads.
Different Points of View
Until the way you write a T-SQL query doesn't matter, you should test several different solutions for the same problem to gain better performance. Indexed views are a great tuning technique because they store results of queries on disk; especially when you're aggregating data, they can save you plenty in I/O costs. They can improve retrieval performance considerably in some situations, but keep in mind that they can degrade modification performance, too.