Executive Summary:
Microsoft SQL Server 2005 can create statistics on unindexed table columns--called column statistics-which along with index statistics can help the SQL Server query optimizer determine the most efficient query plans. To enable column statistics, you need to make sure the auto_create_statistics database option is set to ON. Column statistics can help the query optimizer choose a better plan in specific cases, such as for queries involving multiple tables or tables that already have indexes.
|
Using statistics on unindexed columns can help
determine the best query plans
SQL Server 2005 generates and automatically
maintains statistics for
every index you create, a feature
that can help you in your query performance-tuning
efforts. (See “Making the Most of Automatic Statistics
Updating,” October 2007, InstantDoc ID 96767 for more
information about automatic updating of index statistics.)
In addition to providing statistics for indexes, SQL Server
can create statistics on unindexed table columns—called
column statistics. Along with index statistics, column
statistics help the SQL Server optimizer create efficient
query plans. By understanding how and when SQL Server
creates column statistics and how they’re used, you can
recognize situations when you could use them to improve
the optimizer’s performance.
Working with Column Statistics
By default, SQL Server automatically creates statistics
every time an unindexed column is referenced in a query’s
WHERE or ON clause. This default applies only when the
auto_create_statistics database option is set to ON, which
is the default setting in the model database and thus in all
new databases you create.
The examples I’ll employ in my discussion of column
statistics use copies of the Person.Contact and Human
Resources.Employee tables from the AdventureWorks
database. To follow along, first make sure that auto_create_statistics is set to ON in the AdventureWorks
database, by running the following statement:
ALTER DATABASE AdventureWorks
SET auto_create_statistics ON;
Next, run the code in Listing 1, which creates the two
tables in the database. You can see all statistics that exist
on a specified table by using the sp_helpstats system procedure.
By default, sp_helpstats shows only the column
statistics, but when you add a second parameter, 'ALL',
the procedure will return both column statistics and index
statistics.
To see how sp_helpstats works, run the following code
to view the statistics on the dbo.Contact table before you
run any queries against it:
EXEC sp_helpstats
‘dbo.Contact’, ‘ALL’;
(Some code in this article wraps to multiple lines because
of space constraints in print.) You should see the message
This object does not have any statistics or indexes. However,
if you query the dbo.Contact table, then examine the
statistics again, as the following code shows, you should
see different results.
SELECT * FROM dbo.Contact
WHERE FirstName like ‘K%’;
GO
EXEC sp_helpstats ‘dbo.Contact’;
Your output should look similar to that in Table 1.
The results (without the 'ALL' parameter) show you
the automatically created statistics on the FirstName
column, but they’d also return any statistics you created
manually by using the CREATE
STATISTICS command. The
automatically created statistics
have a system-generated name
that always starts with the eight
characters “_WA_Sys_.” These eight characters are followed by a hexadecimal value indicating
the ordinal position of the column in the table (the
fourth column, in this case) and a hex value representing the
table’s object_id. For the output in Table 1, the hex value
76EBA2E9 is equivalent to the decimal value 1995154153,
which the object_name function confirms is the object_id of
the dbo.Contacts table. Manually created statistics can have
any name you choose as long as it’s a legal identifier name.
SQL Server 2005 also provides a catalog view called
sys.stats that lets you list your column statistics, but to get
the basic information provided by the sp_helpstats procedure,
you’d need to join sys.stats with sys.stats_columns
and sys.columns and include special processing for the
cases where manually created statistics included multiple
columns. If you also want to return information on index
statistics, you must also include the sys.indexes view.
For the most part, SQL Server manages column statistics
the same as index statistics, regardless of whether
column statistics are automatically or manually created.
If the auto_update_statistics database option is set to ON,
column statistics will be automatically updated as your
data changes. The point at which SQL Server will update
the statistics is based on the same recompilation threshold
that index statistics use—namely, when 20 percent of the
rows in the table are modified. The UPDATE STATISTICS
command will update all statistics on a table, both
index and column statistics. Manually created statistics
can be on multiple columns, but unlike index statistics,
there’s no column or byte limit on column statistics.
Statistics Changes in
SQL Server 2005
Microsoft completely rewrote the algorithms for SQL
Server 2005 that determine how it gathers statistics, but
you shouldn’t notice this change in your applications when
you migrate from an earlier SQL Server version, except for
perhaps better performance because the algorithms make
the statistics more accurate.
There’s one difference you might notice after upgrading
to SQL Server 2005, if you’re used to examining the database
catalog and index ID values in sysindexes. In SQL
Server 2000, the sysindexes.indid column is a tinyint, and
values 2 through 250 are used for both index ID and statistics
ID values. A clustered index always has a value in the
indid column of 1. So although the documentation says
you can have 249 nonclustered indexes on a single table,
if you have many statistics on a table, in SQL Server 2000
you can run out of possible indid values before you create
anywhere near 249 nonclustered indexes.
Continue to page 2
Prev. page  
[1]
2
next page