December 19, 2001 07:19 PM

Identifying Duplicate .jpg Files

Rating: (0)
SQL Server Magazine
InstantDoc ID #23234

I've created a table in SQL Server 2000 that stores .jpg files in an image column. Unfortunately, the table includes many duplicate images. Can I compare the image value in one row with the image contents of another row to determine whether the contents are identical?

I'm sure you've tried using the equal to (=) operator to compare the columns through a JOIN clause. Unfortunately, you can't use image data types with the = operator. However, a simple workaround might help you deal with this limitation.

Add a Comment

In case of self join that List2 shows, E1.Employee_Id<>E2.Employee_Id should be added.

GARY1/3/2003 5:56:39 PM


I find the best, quickest and easiest way to find duplicates is not by using JOINS (which doesn't work if you have NULLS) but by using GROUP BY's.

Just GROUP BY the fields you need to check for duplicates (in this case, the SUBSTRINGS() of the image) and where you have a count(*) of >1, you have a duplicate record! And this handles NULLs just fine, which JOINS do not.

Not only that, but you know exactly how many duplicates you have because the COUNT(*) tells you!

The SQL statement is much shorter, quicker, easier to read and debug.

- Jeff

Jeff12/24/2002 7:14:55 AM


Don't you think that the function CHECKSUM might help instead of using several substring() calls ?

Stephane Baudet 12/24/2002 2:33:15 AM


You must log on before posting a comment.

Are you a new visitor? Register Here
GOOGLE LINKS
SPONSORED LINKS
FEATURED LINKS