DOWNLOAD THE CODE:
Download the Code 44807.zip

As I've mentioned, I get many article ideas from questions that people ask me directly or that come up on the public newsgroups. When the same question comes from several directions at once, I start thinking that maybe it's time to write about that question. My idea for this month's column started when a student in my class asked me a question after I described the structure of nonclustered and clustered indexes. As I mentioned in my November 2004 column, "Get Into Index Structures" (InstantDoc ID 43939), when a table has a clustered index, any nonclustered index will use the clustered index key as a pointer to keep track of where in the table to find the row corresponding to a particular nonclustered index key. For example, if your clustered index is on Social Security number and you have a nonclustered index on last name, the leaf level of the nonclustered index will contain every last-name value and the corresponding Social Security number. SQL Server treats the Social Security number as if it were another key column of the nonclustered index.

The student asked me about the performance implications of explicitly declaring the clustered key as part of the nonclustered index. Specifically, he was asking if there was any reason not to define a nonclustered index on both last name and Social Security number in a scenario like the example above. SQL Server won't duplicate columns if they're included in both the clustered key columns and the nonclustered key columns, so I said that if the table had a clustered index on Social Security number, then a nonclustered index on last name and a nonclustered index on last name and Social Security number would be identical.

Less than 2 days after the class, the same question came up on the private MVP newsgroups. One MVP mentioned that Peoplesoft applications tend to explicitly name the clustered index columns in their nonclustered index definitions. He mentioned that this method has the benefit of explicitly naming all the columns included in the index without you having to check or remember exactly which columns the clustered index contains. However, this doesn't really affect storage resources or query performance.

I thought about this topic some more and realized that explicitly listing the clustered index columns in a nonclustered index could actually affect performance. This could happen if you listed the columns in a different order than the one that SQL Server assumes by default. I kept thinking about this subject and eventually decided to put it to the test. If explicitly listing the clustered key columns in a nonclustered index could give a performance benefit, I wanted to find an example of it.

To see this behavior for yourself, run Listing 1's script, which creates a copy of the Order Details table in the Northwind database, then copies the table into itself nine times, for a total of 1,103,360 rows. It then copies that big table into one called BigDetails, adding an identity column.

First, I wanted to make sure that there's no index size difference between when you explicitly include the clustered index columns in a nonclustered index definition and when SQL Server uses the clustered keys automatically. To check this, I created a clustered index on the table:

CREATE CLUSTERED INDEX Order_index
ON BigDetails(OrderNum, Discount)

Next, I created a nonclustered index on two other columns:

CREATE INDEX OrderID_Price
ON BigDetails(OrderID, UnitPrice)

I used the undocumented sp_MSindexspace procedure to check the size of this index:

EXEC sp_MSindexspace BigDetails, OrderID_Price

The reported size was 28,504KB. I then recreated the nonclustered index and included the clustered index columns:

CREATE INDEX OrderID_Price
ON BigDetails(OrderID, UnitPrice, OrderNum, Discount)
WITH DROP_EXISTING

Checking the size of this index, I found it to be the same 28,504KB. This nonclustered index has all the columns in the same order as they would be in the original nonclustered index, which has just two defined keys. The index starts with OrderID and UnitPrice and then includes the clustered keys OrderNum and Discount.

Now I wanted to find a query that would perform better if I changed the order of the keys in the nonclustered index, compared to if I didn't specify the clustered index columns. It took me so long to find such a query that I started to think maybe I was wrong. I wondered whether SQL Server overrode the order I specified for the keys when I included the clustered keys in the nonclustered index definition. To test this hypothesis, I ran a query that would take advantage of the order I used in the index creation. I first created the nonclustered index with the columns in a different order:

CREATE INDEX OrderID_Price
ON BigDetails(OrderID, Discount, UnitPrice, OrderNum)
WITH DROP_EXISTING

The following query is a covered query, which means everything the query needs is in the leaf level of the nonclustered index, so SQL Server doesn't have to go to the data pages to process this query. In the leaf level of this index, I assumed that that the keys were exactly in the order I specified when I created the index.

SELECT  OrderID, UnitPrice, OrderNum, Discount
FROM BigDetails
ORDER BY OrderID, Discount, UnitPrice, OrderNum

When I looked at the query plan for this query, I saw that SQL Server was scanning the nonclustered index and not performing any sort operation. That meant it was taking advantage of the sorted order of the data in the leaf level of the nonclustered index. As a final proof, I rewrote the query to order the data in the same order it would be in with only OrderID and UnitPrice in the nonclustered index (and OrderNum and Discount defined in the clustered index):

SELECT OrderID, UnitPrice, OrderNum, Discount
FROM BigDetails
ORDER BY OrderID, UnitPrice, OrderNum, Discount

The plan for this query included a sort operator, showing that SQL Server was storing the data in a different order when I explicitly listed the columns in the nonclustered index definition than when I didn't.

   Prev. page   [1] 2     next page



You must log on before posting a comment.

If you don't have a username & password, please register now.

Reader Comments

great article

araewillis

Article Rating 5 out of 5

Excellent article! I only wish I would have gotten to it sooner. :)

Milftronic

Article Rating 5 out of 5

 
 

ADS BY GOOGLE