DOWNLOAD THE CODE:
Download the Code 21704.zip

Editor's Note: Send your SQL Server questions and comments to SQL Server MVP Brian Moran at savvy@sqlmag.com.

Deleting Duplicate Records

I have a table containing a set of columns that are supposed to be unique but that have duplicate values, which Table 1 shows. I'd like to end up with a table that looks like this:

ID First Last
106 Robert Aide
107 David Koidl
108 Hongbo Li

How can I delete the duplicate records?

You can usually solve a problem in several ways with T-SQL, and the problem of deleting duplicate records is no exception. Listing 1 shows one solution that you might find helpful. This reasonably complex T-SQL query uses a correlated subquery. Correlated subqueries are a little bit like nested loops. Physical execution plans might be different from the following logical description of the query, but you can interpret this query in the following way:

  • For each row of the dups table, run an existence check.
  • Select the FirstName, LastName, and NameId columns from the dups table in the outer query.
  • The existence check is true if the correlated subquery (which is the inner query) returns any rows--in other words, if the row that the outer query is processing is duplicated in the table.
  • Delete the current row from dups if the existence check was true.

When the code checks the first row in Table 1, which has NameId 106, the EXISTS check returns false because the inner query doesn't return any rows. Yes, rows match based on FirstName and LastName, but none of the matching rows have a NameId less than 106. So, the code doesn't delete this row. Now, assume the current row of the outer query is the row with NameId 123. The EXISTS check returns true for this row because the inner query returns a row that matches FirstName and LastName and also has a NameId that is less than 123. Which row causes this match? The row with NameId 106 has the same FirstName and LastName as the row with NameId 123. And DupsInner.NameId 106 (which is the alias I use for the table in the inner query in Listing 1) is less than the NameId value of 123 that the code passed in from the outer query.

Correlated subqueries and existence checks can be confusing to the T-SQL novice. But they can be powerful, and learning to use them is worth the effort.

End of Article




Post Your Comments Here

You must be a registered user or online subscriber to comment on this article. Please log on before posting a comment. Are you a new visitor? Register now

Reader Comments

This query is the best way in terms of performance for a large database query over internet?

Javier

Your Comments (required):i receive duplicate mails in my outlook when i access my mailbox other locations of company.all locations are connected by leased line.i am not getting where the problem is ..??? fonly few of my users face this problem..where could be the problem? server or outlook

sopan

good

Anonymous User

Article Rating 3 out of 5

good

Anonymous User

Article Rating 5 out of 5

What would be the best if you do not have a field with a unique number like the ID field.

Anonymous User

Article Rating 5 out of 5

you could get all distinct records to a temporary table, delete everything in the main table, then copy the distinct records back:

CREATE TABLE Dups (FirstName varchar(10), LastName varchar(10)) GO

INSERT INTO Dups VALUES ( 'Robert', 'Aide') INSERT INTO Dups VALUES ( 'David', 'Koidl') INSERT INTO Dups VALUES ( 'Hongbo', 'Li') INSERT INTO Dups VALUES ( 'Robert', 'Aide') INSERT INTO Dups VALUES ( 'David', 'Koidl') INSERT INTO Dups VALUES ( 'Hongbo', 'Li') INSERT INTO Dups VALUES ( 'Robert', 'Aide') INSERT INTO Dups VALUES ('David', 'Koidl') INSERT INTO Dups VALUES ('Hongbo', 'Li') INSERT INTO Dups VALUES ('Robert', 'Aide') INSERT INTO Dups VALUES ('David', 'Koidl') INSERT INTO Dups VALUES ('Hongbo', 'Li') GO

--CREATE A TEMPORARY TABLE THAT HAS THE SAME SCHEMA AS MAIN TABLE create table #dupstemp (FirstName varchar(10), LastName varchar(10)) --NOW GET DISTINCT VALUE FROM MAIN TABLE INSERT INTO #dupstemp select distinct * from dups --DELETE EVERYTHING FROM MAIN TABLE DELETE FROM dups WHERE EXISTS ( SELECT * FROM dups DupsInner WHERE DupsInner.FirstName = Dups.FirstName AND DupsInner.LastName = Dups.LastName ) GO

--NOW COPY DISTINCT VALUES BACK IN INSERT INTO dups select * from #dupstemp

--VOILA select * from dups

Anonymous User

Article Rating 4 out of 5

tttttttt

Anonymous User

Article Rating 5 out of 5

 
 

ADS BY GOOGLE