Interpreting the Statistics
Index statistics for a single column consist of one histogram, one density value, and one all_density value. Multicolumn index statistics for one set of columns in a composite index consist of one histogram for the first column in the index, one density value for the first column, and all_density values for each prefix combination of columns (including the first column alone). This means that if the index is on columns a, b, and c, SQL Server will compute an all_density value for column a, an all_density value for the combination of columns a and b, and an all_density value for the combination of columns a, b, and c.
The availability of density information for all columns helps SQL Server decide how useful the index is for joins. For example, suppose that an index consists of three key fields. The density on the first column might be 0.5, which isn't too useful. But as you look at more columns in the key, the number of rows that are pointed to is fewer than (or in the worst case, the same as) the number of rows that are pointed to by just the first column, so the density value goes down. If you're looking at both the first and second columns, the density might be 0.25, which is somewhat better. And if you examine all three columns, the density might be 0.03, which is highly selective. (Referring to the density of only the second column doesn't make sense because the lead column density is always necessary.)
Here's an example of how to interpret multicolumn index statistics. Suppose you have a nonclustered composite index (idx3) on the state and au_lname columns and an existing clustered index on the au_id column in the Authors table of the Pubs database. You can use the DBCC SHOW_STATISTICS command to display the statistics information for this index:
DBCC SHOW_STATISTICS(authors, idx3)
Table 1 contains the results. This output shows that the last update of the statistics for this index was on June 20, 1999. The output also shows that the table currently has 23 rows. State has eight distinct values, but two of them occur in multiple steps, so only six of them are nonfrequent (NF). In this data set, all the NF key values occur only once in the data, so the NF count is the same as the unique NF count, and the computed density value is 6/6/23, or about .0434.
The all_density value for the state column is 1/8, or 0.125. For the combination of state and au_lname, only one duplicate value occurs, so there are 22 unique values, which results in an all_density value of 1/22, or about .0454. An all_density value also exists for the combination of state, au_lname, and au_id. Because this table has a clustered index on au_id, that key appears along with every nonclustered index key. You can use that key to determine density information. The addition of au_id makes the three-valued key unique. In this case, all_density is 1/23, or about .0434; this value is the same as the density for the first column, but that's a coincidence.
SQL Server 7.0 can also keep track of statistics on columns with no indexes. Knowing the density (or the likelihood of a particular value occurring) can help the optimizer determine an optimum processing strategy, even if SQL Server can't use an index to locate the values.
You can use the CREATE STATISTICS command to create statistics on nonindexed columns. Also, you can execute the sp_createstats stored procedure, which creates single-column statistics for all eligible columns for all user tables in the current database. Eligible columns include all columns except computed columns; columns of the ntext, text, or image data types; columns that already have statistics; and columns that are the first column of an index.
Updating the Statistics
If your database option auto update statistics is set to true, SQL Server automatically updates the statistics whenever the query optimizer determines that the index statistics are out of date. Statistics are considered out of date when enough data-modification operations occur to minimize the statistics' usefulness. The number of operations is tied to the size of the table. Typically, the equation is something like
500 + 0.2 * (number of rows in the table)
This equation means that the table must have at least 500 modification operations before updates to the statistics occur during query optimization. For large tables, this threshold can be much higher.
You can manually force updating statistics in two ways. Run the UPDATE STATISTICS command on a table, one specific index, or column statistics. Or you can execute the sp_updatestats stored procedure, which runs UPDATE STATISTICS against all user-defined tables in the current database.
Automatic updates to index and column statistics occur by default because every database has two database options (auto create statistics and auto update statistics) set to TRUE upon creation of the database. You can use the sp_dboption stored procedure to disable either option. However, I don't recommend that you set this option to FALSE unless thorough testing shows that you don't need the automatic updates or your performance overhead is more than you can afford. This database option overrides all individual settings in tables. In other words, if the database option auto update statistics is FALSE, no automatic statistics updating takes place.
You can disable automatic statistics updating for a specific table in two ways. First, you can use the sp_autostats stored procedure to unset (or set) a flag for a table to specify whether to automatically update the statistics for that table. Also, you can use this procedure with only the table name to find out whether a table is set for automatic updates of index statistics. Second, you can execute the UPDATE STATISTICS command with the WITH NORECOMPUTE option, which prevents automatic recomputing of statistics. Running the UPDATE STATISTICS command again without the WITH NORECOMPUTE option enables automatic updates.
To avoid long-term maintenance of unused statistics, SQL Server deletes old statistics automatically created on nonindexed columns. The StatVersion column of the Sysindexes table shows how many times the statistics have been updated. After SQL Server updates the column statistics more than a certain number of times (SQL Server sets this threshold), it drops the statistics rather than update them again. (Statistics that you create are not subject to this process.) If SQL Server needs the statistics in the future, it can recreate them; there is no substantial cost difference between creating statistics and updating them.
The names of system-generated statistics are distinctive, such as _WA_Sys_au_fname_07020F21. You can use the sp_helpindex stored procedure to list the names of column statistics and table indexes. For example, running
sp_helpindex authors
will produce the output shown in Table 2. This example includes two indexes and two sets of statistics. The name of explicitly created statistics is s1. The name of system-generated statistics is _WA_Sys_au_fname_07020F21.
Knowledge Is Power
For the most part, you should leave the statistics-updating mechanisms at their default settings and let SQL Server take care of all statistics' updates. Your SQL Server system can perform well if you never run DBCC SHOW_STATISTICS or even think about statistics maintenance. However, understanding how SQL Server generates and stores statistics and what SQL Server uses them for can give you greater confidence in the decisions the SQL Server optimizer makes. Knowing how SQL Server works on the inside can also help you manage your data and applications and design better queries.
End of Article
Prev. page
1
[2]
next page -->