The query optimizer is one of the most complex pieces of code in the SQL Server database engine. The optimizer's job is to generate a query plan, which is a set of steps that SQL Server will take to carry out a user query. The optimizer makes decisions such as which indexes to use for each table, which join method to use and in which order to process the joined tables, whether to build an internal worktable to hold intermediate results, whether to perform an internal sort, and whether to run the query on multiple processors.

In the next few months, I'll tell you about many of the changes to SQL Server's query optimizer that Microsoft introduced in the last two releases. This month, I look at the optimizer's background and describe ways the optimizer has changed. In upcoming articles, I'll show you examples of queries that are optimized differently in SQL Server 2000 and 7.0 than in previous releases and that perform better because of the changes.

The Old Days
Before SQL Server 7.0, the optimizer had very few choices to make. It evaluated each possibly useful index and either chose one of them or chose to perform a table scan. SQL Server could process a JOIN operation in only one way, so the optimizer needed to decide only the order in which to join the tables. SQL Server could process GROUP BY and DISTINCT queries in only one way, and a query couldn't run on multiple processors.

The optimizer in SQL Server 6.5 and earlier releases was so straightforward that once you understood the way it worked and the way SQL Server could use indexes, you could predict fairly accurately what the query plan would (or should) be, even before running the query. If the optimizer didn't produce the plan you thought it should, you could enable a couple of trace flags to see the steps that the optimizer took to determine the plan and all the potential plans the optimizer considered. In addition, the optimizer had so few choices to make that if it didn't make the best choice, Microsoft considered it a bug. In effect, you had an implied guarantee that the optimizer's query plan was always the best plan.

For SQL Server 7.0, Microsoft completely rewrote the optimizer and added many query-processing techniques. The updated optimizer was completely modular so that as Microsoft developers added new processing techniques to the product, they could easily amend the optimizer to consider each new technique in any query plan.

The query optimizer in SQL Server 7.0 and later releases is orders of magnitude more complex than its predecessors. SQL Server can use so many different processing strategies that, for many queries, it's almost impossible to predict which plan the optimizer will choose. These releases also have no equivalent to the trace flags that showed you the plans the optimizer was considering, probably because the optimizer goes through hundreds of thousands of steps to choose each plan, so the output of such trace flags would be too complex to use.

No Guarantees
As I mentioned, before SQL Server 7.0, you had a kind of guarantee that the optimizer would find the best plan to process your query as long as the optimizer had all the information it needed. If you, as a programmer, could use a query hint and make the query perform better, it meant that the optimizer hadn't come up with the best plan, and Microsoft usually considered this to be a bug. SQL Server 2000 and 7.0 carry no such guarantee, either implicit or explicit. In these releases, the optimizer's goal is to find a "good-enough" plan. In most cases, this plan will be the best—but not always.

You can think of the current optimizer's processes for coming up with a query plan as cyclical, with each iteration of the cycle trying a more complex plan. For example, the first iteration might evaluate the cost of using only each table's clustered index and a simple nested-loop join. If that approach doesn't produce a good plan, the optimizer might evaluate nonclustered indexes, then different join methods, then try using multiple indexes.

As the optimizer evaluates increasingly complex plans, it calculates the cost-benefit ratio of continuing to search for a better plan than any it has found so far. For example, suppose the optimizer spends 1ms finding a plan that it estimates will take 30ms to execute, then determines that it might find a plan that would run in 25ms—but finding it would take 2 more seconds. SQL Server will usually decide that the cost of continued searching isn't worth the benefit. So, it uses the plan that takes 30ms, which the optimizer decided was good enough.

If you think that plan isn't good enough, you could try modifying your query with hints. You might find that by using hints to force SQL Server to use particular indexes and join methods, you can reduce the execution time to 25ms. The fact that you can find a faster plan is no longer considered an optimizer bug; it's a choice. It was SQL Server's choice to stop the optimizer from continuing to search for a better plan. And you have a choice, too—whether to use the hint in your code for that 5ms performance gain.

   Prev. page   [1] 2     next page
 
 

ADS BY GOOGLE