Deleting duplicate rows from a MySQL table – except one!


Here’s the problem: we have a customer’s database and inasmuch as we’ve tried to eliminate duplicates, we still have them. Multiple Customer Relationship Managers have gone ahead to create customer records without first checking if they exist and in so doing we now have instances where a single customer could appear five to eight times in our database. This is simply not good enough.

We laid this problem at the doorstep of a Tech Intern at the office (while we focused on more important stuff, yeah right) but she couldn’t figure it out. That’s where I came into the picture.

Here’s the Solution

So we had a table of this form:

What we want to do here is to delete all duplicates and leave just one record.

Here’s the SQL statement to do it:

This query, when executed on the dummy table we have above would delete row 1 and row 4 (Google), leaving row 6. It would delete row 2, leaving row 8 (Yahoo) and it would delete row 5 leaving row 7 (MSN).

In essence, the last condition (highlighted) instructs MySQL to delete the records with the lower ids hence, keeping the record with the highest id value. You could change it to  AND s1.`id` > s2.`id`  to keep the record with the lowest id value instead.

Of course, you want to back up your database table before trying this out. Here’s a quick way I discovered some months back: