• subscribe
September 18, 2003 12:00 AM

Inside Optimization

Find out what makes the optimizer tick
SQL Server Pro
InstantDoc ID #39822

Can You Take a Hint?
I don't intend to fully explain all the query hints you can use when writing SQL Server queries. Not all hints affect the plan that the query optimizer chooses; quite a few query hints control the locking mechanisms that SQL Server uses when executing a query, so they're irrelevant to this discussion. However, I want to make sure you understand a couple of important details about hints.

First, a hint isn't really a hint as we know it. In English, a hint is a gentle suggestion, but in SQL Server terms, a hint is more like an order. Unless the order is physically or logically impossible, SQL Server will obey any hint you give it.

Also, bear in mind that a hint that improves your query performance today might make it worse tomorrow. Once you use a hint to tell SQL Server how to process your query, you lose all benefit of the incredibly sophisticated query optimizer. A change in your data distribution because of updates or batch loads might mean that the hinted query plan is no longer optimal, but the optimizer can't decide to disregard the hint and come up with a better plan.

Running the Numbers
One of the most important things the query optimizer needs for decision-making is accurate, up-to-date statistical information about your data values and their distribution. Although the way that SQL Server tracks statistics has changed over the years, the idea is still the same. Statistics give the optimizer a way to make an educated guess about an index's usefulness.

For example, you're probably familiar with the basic structure of a nonclustered index, in which the leaf level has a pointer (also called a bookmark) for every data row in the table. Suppose you have a nonclustered index on a field called lastname. Every lastname value in the table, including duplicates, is in the leaf level of the index, and a pointer indicates where to find that row in the table. The last names in the index's leaf level are in order, so all the names that start with Mc are near each other. However, the corresponding table rows might be on separate data pages, possibly spread out over dozens of pages. A query to find all those surnames might look like this:

SELECT lastname, firstname, address, phone
FROM PeopleTable
WHERE lastname LIKE 'Mc%'

In general, a nonclustered index is useful only if you need to access a very few rows through the index. If you need to access a lot of rows, scanning the whole table might be more efficient because SQL Server wouldn't have to follow all the bookmark pointers. How many rows are "a lot" depends on many factors, including the size of the table, the percentage of rows to be accessed, and the number of rows that fit on a page. I've found that if a query needs to access less than 1 percent of a table's rows, SQL Server can effectively use a nonclustered index that helps locate those rows. Of course, that's just a ballpark figure; every query and index will have their own cutoff point. (For more details about the use of indexes, see my July 2001 column, "Are You in Tune?" InstantDoc ID 21038.)

How can SQL Server know, before your query has executed, whether the table contains a lot of names that start with Mc or only a few? The answer is that it can get an estimate by examining the index statistics. As I mentioned, the format of statistics changed completely in SQL Server 7.0 and again in SQL Server 2000. For details about how SQL Server 2000 stores statistics, see the Microsoft white paper "Statistics Used by the Query Optimizer in Microsoft SQL Server 2000" at http://msdn.microsoft.com/library /default.asp?url=/library/techart/statquery .htm. My October 2001 column, "Statistically Speaking," InstantDoc ID 22075, contains additional information about how SQL Server uses index statistics.

For DBAs, one of the most welcome additions that Microsoft made in SQL Server 7.0 was automatic statistics updates. Before SQL Server 7.0, a DBA had to remember to update the statistics on indexes every time data volumes and distribution changed. For example, if last week you had half a dozen customers in Arkansas and you ran a query looking for customers in that state, a nonclustered index on state might have been useful. This week, you acquired two Arkansas-based companies whose customer lists contain thousands of names. Now, using the index that was appropriate last week might give horrible performance. The optimizer in SQL Server 6.5 wouldn't know about the new Arkansas customers unless you manually updated the statistics. Worse, you had no way to update all the statistics in a database. You could use the UPDATE STATISTICS command on only one table at a time, and you'd have to write your own procedure to access every table in the database if you wanted to update all database statistics as part of regular maintenance.

SQL Server 2000 and 7.0 have a database setting called auto update statistics that's set to TRUE by default. Every time an internally defined percentage of values in an index changes, the optimizer updates statistics before trying to determine the optimal plan. In addition, SQL Server 2000 and 7.0 have a stored procedure called sp_updatestats that updates the statistics on every index in the current database.

A related feature Microsoft added in SQL Server 7.0 lets you exclude individual indexes from automatic statistics updates. You can use the WITH NORECOMPUTE option with the UPDATE STATISTICS command to override the database setting of auto update statistics for individual indexes. These options give you almost total control over index-statistics updates.

Put the Optimizer to Work
The optimizer is now so modular and so easy to enhance that Microsoft frequently adds query-processing techniques in service packs without mentioning them in the README file. For most people, that's not a problem. If SQL Server starts performing better after an update to a new service pack, most people won't complain. However, for someone like me who writes and speaks about the optimizer and the various query-processing techniques, it means a recheck of all my demos and sample files after every service-pack upgrade to make sure that my query plans are what I expect them to be. I don't always remember to do this, and I've been taken by surprise in front of an audience more than once. Next month, I'll tell you about some optimizer changes that have caused me embarrassment. And I'll show you some examples of queries that take advantage of these and other improvements in SQL Server.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here