Executive Summary:
I often see too many indexes on a given table that turn out to be redundant or less useful than they might seem to be at first. Use this trick to identify duplicate or redundant indexes on the tables in your databases. |
I often see too many indexes on a given table that turn
out to be redundant or less useful than they might
seem to be at first. If you have more indexes than you
really need, those indexes can be potentially harmful
in terms of performance during Inserts, Updates,
and Deletes, and they can increase the possibility of
deadlocks.
I want to explore some of those reasons and show
you how to identify the candidates for removal. To do
so, I’ll use the DupeIndex script, which will generate a small report when I run
it against my copy of the Adventureworks database. To
highlight areas relevant to this discussion, I’ve included
a few additional indexes in the database. For a little
background about how to look for missing indexes,
how to use a SQL Server DMV to examine the statistics
of your existing indexes, and how to set a handle on
which ones are in use, see the Learning Path.
Is It That Simple?
Take a look at Table 1, which shows the second portion
of a report that outlines potential duplicate indexes.
The report simply observes all indexes for each table
and determines whether any have leading columns that
are the same. Of course, matching leading columns
don’t always mean that a certain index is duplicate or
useless. Often, a compound index can have the same
leading column as another index but be more selective.
Or, the index might be a covering
index—that is, an index that includes all
the columns necessary to satisfy both
the WHERE clause and the SELECT
list, and negates the need to access the
table itself.
Now, let’s examine a couple situations
in which the indexes might be
redundant. If you look at the indexes
for the Vendor table, you can see that
the VendorID column is the Primary
Key (PK) and has a Clustered Index.
That tells you that the index will never
return more than one row if you include
the VendorID in the WHERE clause
with an equality expression; the fact that it’s clustered means that there’ll never be an additional
lookup to satisfy the columns in the SELECT list.
Therefore, it’s highly unlikely that you’d ever need the
index on VendorID and AccountNumber because the
engine can filter via the AccountNumber after it finds
the row via the VendorID.
An exception is possible if the combination of
VendorID and AccountNumber has been declared as
a Unique Constraint. In that case, the combination is
serving to enforce uniqueness even though it might never
be used in a lookup. Even if this index was useful in our
case, we certainly don’t need two of them. As you can see
in Table 1, there are two identical non-clustered indexes
on VendorID and AccountNumber—clearly unnecessary.
Truly duplicate indexes provide no advantages whatsoever
and will only add to the overhead associated with
changes to the table and index maintenance. Another
redundancy—this time with only a single column in the
index—is on the BillOfMaterials table and the Unit-
MeasureCode column. Again, these two indexes are true
duplicates and only one of them might be necessary.
What about the indexes on the CultureID column
of the Culture table? One is a clustered index and the
other is non-clustered. The clustered index also happens
to be the Primary Key constraint. Many people
believe that a situation such as this is warranted, that
you need an actual index that’s searchable in addition
to the constraint. This myth is long-propagated. Constraints
such as Primary Key and Unique create an index behind the scenes to enforce the constraints and
are usable in searches, just like any other index.
Creating an index on the same expression as the Primary
Key constraint is the most common concern I see
related to duplicate indexes. The second most common
is creating a nonclustered index on the same expression
as the clustered index. If you have a clustered index
already defined on a particular column, there’s no need
for a non-clustered index on the same column.
Put It in
Reverse
Table 2 is a sample of
what you might see in the
Reverse Indexes section
of the report. In a reverse
index, the columns of two
indexes on a table are the
same but in reverse order
in the index expression. In this case, there are two columns—VendorID and
AccountNumber—on the Vendor table. If all the
queries that referenced either of these columns in the
WHERE clause also included the other, these would
be true reverse duplicate indexes. If both columns are
specified, the order in the WHERE clause or index
expression isn’t a factor because the engine is smart
enough to arrange the lookup to match the index
expression.
However, if you specified only one of the columns, it
would need to be the first column in the index expression
in order to do a seek. In this particular example, it might
be best to create two single-column indexes on VendorID
and AccountNumber instead of compound indexes. Or,
if you frequently query on two columns together and
sometimes on just VendorID, you can have a single index
on VendorID, AccountNumber to satisfy both queries.
You’re the Judge
Remember that this simple report can’t replace a
proper understanding of your data model and table
usage. It can merely outline potential candidates for
removal; you must be the final judge and apply good
reasoning before taking action.
End of Article