recent

Titulo

Finding Duplicate Records


How to find out duplicate records from a table? Read this question again?  How to check for duplicates records from a table? Ok, the first question is to find out what is a duplicate record to you or from the requester? Sometime, we are referring to the occurrence of a record/row of data more than once and sometime, we are referring just occurrence of few columns from the table more than once. Whatever the answer be, we will cover all the possible answers and write query to display the duplicate.  As a bonus, we will also talk about how we can prevent duplicate insert from happening and never have to deal with duplicate problem again.

Possible 1: We have employee table and we are going to check and display duplicate rows. We will call data duplicate if the data from all the columns from employee table occurs more than once.

Query:
SELECT   empno, 
         ename, 
         job, 
         mgr, 
         hrideate, 
         sal, 
         comm, 
         deptno, 
         COUNT(*)  -- Use all the columns from employee table
FROM     hr.employee 
GROUP BY empno, 
         ename, 
         job, 
         mgr, 
         hrideate, 
         sal, 
         comm, 
         deptno 
HAVING   COUNT(*) > 1;


The select query here selects the row that has occurred more than once. Here, the data must match from all the columns from employee table.

Possible 2: The other possibility is that the request is only concerns if two or more employees have same empno. No two employees should have same employee number, don’t you agree? Here, we are only concerns about the employee number even though the rest of the details are different. How do you find the duplicates empno here?

Query:

SELECT  empno, 
    COUNT(*)
FROM     hr.employee 
GROUP BY empno 
HAVING   COUNT(*) > 1;

Do you want duplicate records? No, you don’t, how do you resolve this? The first step is to remove duplicate records and then add Primary Key or Unique Constraint on empno. Adding constraints resolves most of the problem regarding duplicate but not all. We will discuss what constraints solve and what it does not next.

Now the employee table has PK on empno column. We can’t have duplicate employee number. Primary Key ensures no duplicate data on the column and also adds Index automatically. To verify if the PK is working/exist or not, you can use the above query or just check on empno column instead of all the columns to display duplicate. Well this isn’t the right way to check PK there are other solid and fast way which we are interested here.

What PK does not prevent is other columns. You will have unique employee number but all other information may be duplicated.  This is possible when someone is doing a data entry from an application or loading employee data from a file. This is something we can’t prevent it. Therefore, I would check to ensure PK is there and do a duplicate check on all the columns except the one with PK constraint. The query output should always be NONE. If the query displays some output either your query is wrong or you have duplicate records.


Interested in working with me? I can be reached at pbaniya04[at]gmail.com for any questions, consulting opportunities or you may drop a line to say HELLO. Thank your again for visiting my blog and looking forward to serving you more.

Have a Database-ious Day!

No comments

Powered by Blogger.