We all want our SQL Server applications to run as fast as possible. And query tuningmaking sure SQL Server can process each query your application executes as quickly as possibleis an area in which you can make a significant impact on overall application performance. For the past 14 months, I've devoted this column to the topic of query tuning for optimum performance, paying particular attention to choosing the best indexes for the types of queries your applications are performing. This month, I conclude the series by summarizing some important points to keep in mind when you're designing the best indexes for your queries.
The most important piece of advice I can give you to start with is that you can't design indexes in a vacuum. You need to know the kinds of queries your users will be running. Without that information, you have no way to know which indexes might be most helpful. To determine the kinds of queries your users will run, you need to ask the following questions about each of your tables:
- Which columns in the table will most likely be used as search arguments (that is, which columns will appear in conditions in WHERE clauses)?
- Will queries access a large percentage of rows in the table or only a few rows (or one)?
- Which columns will SQL Server need to return to the client application most often?
- Will queries use GROUP BY or TOP to summarize the data?
- Will SQL Server need to return the results to the client application in a particular order?
- Will the table be used for SELECT queries, data-modification queries, or both?
- How many rows will be updated at a time?
- Which columns will be updated frequently?
- Which columns will be used in joining tables?
Determining the answer to question 1 will help you find the answer to question 2. When you know which columns will be used as search arguments, you can execute a simple GROUP BY query with a COUNT(*) aggregate on any column that's used as a search argument to determine the distribution of each possible data value. For example, if a query containing a WHERE clause that references the customerID column will frequently access your orders table, you could run the following query to check the distribution of customerID values:
SELECT customerID, orders = COUNT(*),
percent_of_total = COUNT(*) *
100.0/ (SELECT COUNT(*) FROM orders)
FROM Northwind.dbo.orders
GROUP BY customerID
ORDER BY 3 DESC
The above query returns 89 rowsone for each customerID that appears in the orders table. The second column contains the number of times that customer appears in the table (the number of orders placed), and the third column contains the percentage of the total orders that the second column number equals. Because I included an ORDER BY DESC clause for the third column, the query returns the customers who have the largest percentage of the total orders first.
Figure 1 shows the first row this query returns. This result means that the most active customer has 31 orders, or 3.7 percent of the total number of orders. No customer has an excessive number of orders (e.g., 500), so this experiment tells you that no query searching for a single customerID value in the WHERE clause will return a large percentage of rows. And as you know, you can consider nonclustered indexes for columns used in search arguments in queries that access only a small percentage of the table.
Big on Indexes
Knowing whether a query will return a large percentage of rows helps you determine how and when a nonclustered index will be useful. I discussed one special case of nonclustered indexesthe creation of covering indexesin my August 2001 ("Time for a Tune-Up," InstantDoc ID 21480) and September 2001 ("The Big Cover-Up," InstantDoc ID 21729) query-tuning columns. Knowing the answer to question 3 is important in determining whether a covering index is possible. Unlike noncovering indexes, which need to include only the columns in the query's WHERE, ORDER BY, or GROUP BY clauses, a covering index must include all the columns in a query, including columns to be returned in the SELECT list.
If you have crucial queries that return a large percentage of rows in a table (perhaps more than 5 or 10 percent) and include too many columns to be good candidates for covering indexes, consider putting a clustered index on the columns in the query's WHERE clause. One type of query that might fall into this category is a query based on a range of data values. For a query that requests all information about all customers whose ZIP codes are between 80000 and 89000, you could put a clustered index on the ZIP code column. You must take great care in choosing the clustered index for a table, though, because you only get one such index. (You can have up to 249 nonclustered indexes on a table, depending on the number of non-index statistics you have.)
Clustered indexes are useful for queries that return a large number of rows or a range of values, but they're also useful for the queries you identified in questions 4 and 5, which perform grouping or sorting operations. If rows need to be returned to the client in a particular order or grouped and summarized based on the values in a particular column, a clustered index on the ORDER BY or GROUP BY column could boost performance. If the GROUP BY query returns data based on only one or two columns, you could add a nonclustered index on those columns, but if many columns in the table need to be summarized, a clustered index might be a better choice.
If the answers to questions 1 through 5 give you multiple possible choices for a clustered index, you can look at some other factors to help you decide how to index. For example, instead of building a clustered index on a column that determines the sort order, you can return the data unsorted and have the application perform the sort.
Prev. page  
[1]
2
next page