Finding Duplicates

Provided by: Michelle Swann, Executive Vice President, FMS Professional Solutions Group

Sometimes, users need to determine if duplicate data exists in the database.

You can use a GROUP BY and COUNT to see how many duplicates exist in each group. However, users want to see the primary key identifier of the duplicate rows so that you can display the data or delete the duplicates.

You cannot add the primary key to the query with the GROUP BY because this causes all counts to equal one since the primary key, by definition, is unique.

To get a query that shows duplicates along with the primary key, you can use an "IN" clause in the WHERE clause.

Using Northwind:

SELECT Employees.LastName, Employees.FirstName, Employees.Address, Employees.EmployeeID, Employees.Title
FROM Employees
WHERE (
    (Employees.LastName IN
      (
      SELECT [LastName] 
      FROM [Employees] As Tmp
      GROUP BY [LastName], [FirstName], [Address] 
      HAVING Count(*)>1 And [FirstName] = [Employees].[FirstName] And [Address] = [Employees].[Address]
      )
    )
  )
ORDER BY Employees.LastName, Employees.FirstName, Employees.Address

This query will return the first name, last name, address, employeeid and title of all rows where the last name, first name, and address are duplicated.

Return to Tips & Techniques Page