Search Arguments
The easiest way to decrease the I/O your query requires is to decrease the number of rows SQL Server must examine. You do this by including selective criteria in the WHERE clause of the query. Called search arguments, these criteria help the query optimizer by giving it clues as to which method of data access is probably the fastest. Search arguments have the form ColumnName operator [ColumnName or constant], where operator is =, <, >, >=, or <=. Search arguments joined by the Boolean operator AND are also valid. The phrase BETWEEN...AND is also valid because you can restate it in terms of >= and <= operators. Here are some examples of search arguments:

LastName = 'Lewis'
LastName >= 'Le' AND LastName <= 'LF'
OrderDate BETWEEN '01/01/99' AND '01/31/99'

Notice that none of the negation operators are listed. To process a query including the clause WHERE (ProductID <> 2), SQL Server looks at each record to determine whether the ProductID equals two. Even an index on ProductID won't help much unless the rows with ProductID <> 2 constitute a very small percentage of the table. In almost all cases, SQL Server will satisfy this query with a table scan, not an index.

In terms of optimizing queries, the LIKE operator can be as inefficient as NOT. If you have a clause such as WHERE LastName LIKE '%Mc%, SQL Server performs a pattern search on the entire column. An index won't help, so you can expect the query optimizer to choose a table scan. The one exception happens when the clause looks like WHERE LastName LIKE 'Le%'. The difference is that this search is logically equivalent to WHERE LastName >= 'Le' AND LastName < 'LF', which is a search argument.

In general terms, search arguments help your query by helping the query optimizer determine how selective an index will be in handling that part of the query. Clauses that use the = operator and the < and > operators are search arguments because they limit the search for the rows in the result set. The = operator limits the search to one row, and the < and > operators limit the search to a range of rows.

A clause's selectivity reflects how well the search argument limits the search and is equal to the percentage you get from dividing the number of rows returned by the number of rows in the table. (This definition is slightly simplified, but it makes this discussion easier.) A low percentage means a clause has high selectivity; a high percentage means a clause has low selectivity. Because the AND operator is associative (i.e., a AND b is the same as b AND a), the query optimizer can choose the most selective clause to drive the query processing because those kinds of clauses tend to limit the I/Os the query performs.

For example, consider the query

SELECT * FROM pubs..authors WHERE au_id = 
   '213-46-8915' AND state = 'CA'

Both clauses in the WHERE clause are search arguments. But the state column probably won't have unique values, whereas the au_id column will because it's the table's primary key. Without knowing anything else, you can reasonably guess that au_id = '213-46-8915' has high selectivity and state = 'CA' has medium to low selectivity. Of course, if the state column had only one row with the value of 'CA', both clauses would be equally selective.

The query optimizer decides how selective a search argument is by looking at the corresponding index's statistics. The statistics give an approximate idea of how many rows will match given criteria. Therefore, if the query optimizer knows how many rows are in the table and how many rows each part of the WHERE clause will probably return, deciding which index to use is a relatively simple task. (For more information on SQL Server 7.0's use of statistics, see Kalen Delaney, "SQL Server Statistics: A Useful Query Optimizer Tool," August 1999.) In the previous query, if both au_id and state have an index, SQL Server uses the index on au_id. If au_id has no index but state does, the index on state is the logical choice because it's more selective than a table scan. Without an index on either column, a table scan is the only way to find the rows that match.

(For more information about how the query optimizer works, Microsoft SQL Server 6.5 Unleashed (Sams, 1998) describes in detail several common scenarios. Working through each example takes a while, but you'll write better queries when you know how the query optimizer works. Inside SQL Server 6.5 also has good optimizer information.)

Indexing: More Art than Science
You can't write efficient queries without knowing about the indexes on the table. Without good indexes, even the simplest-looking queries can slow your system tremendously. Your only defense is to know the composition of the data in your tables and to think of your indexes as an integral part of your queries.

Indexes that work well during testing and development might be completely wrong for a production system because the data composition differs from your original assumptions. I've seen systems work well for one customer but perform poorly for another because the data was clustered in the tables in a way that prevented the query optimizer from making good use of indexes. If you get complaints about your application's performance, be aware of this situation and realize that one set of indexes might not fit all your customers.

Now let's look at some broad guidelines for choosing the right kind of indexes and the columns to index. First, because you can have only one clustered index per table, that type of index needs to support the greatest possible number of queries. A clustered index is most useful for range queries because the index's leaf level contains the data sorted in index order. You'll see the most benefit from a clustered index when a query has a WHERE clause that contains >, <, or BETWEEN...AND operators or GROUP BY clauses that list the columns in the same order as the columns in the index. Although it might not help in searching for rows, a clustered index can improve the performance of ORDER BY clauses if the index and the ORDER BY clause have the same columns in the same order.

Because the intermediate level of a clustered index is so small, it works fine when you're searching for unique values. Nonclustered indexes, however, work best for point queries, which find small numbers of rows. WHERE clauses with = operators are prime candidates for nonclustered indexes on the appropriate columns. This index type also works well for the MIN and MAX aggregate functions because finding the first and last entries in a range of values is easy if you look at the index's leaf level. Finally, nonclustered indexes dramatically improve the COUNT function's performance because scanning the index's leaf level is much faster than performing a table scan.

Where Do I Go From Here?
Use the ISQL/window to see how SQL Server reacts when you put various indexes on a table. Enterprise Manager can show you the selectivity of a table's indexes, and SQL Trace can create a script of all the queries sent to the server. By adjusting the indexes, replaying the script, and noting the changes in processing time, you can get a good idea of which indexes are best for your production environment. Just be aware of how many I/O operations your queries use and remember that anything that decreases that number will have a positive effect on overall performance.

End of Article

Prev. page     1 [2]     next page -->



You must log on before posting a comment.

If you don't have a username & password, please register now.

 
 

ADS BY GOOGLE