2013-02-21 37 views
0

您好我正在运行follwoing查询来识别重复的记录。确定重复以及在oracle中匹配的唯一记录

SELECT * 
      FROM unique2 P WHERE EXISTS(SELECT 1 FROM unique2 C 
            WHERE ((C.surname) = (P.surname)) 
             AND ((C.postcode) = (P.postcode)) 
             AND ((((C.forename) IS NULL OR (P.forename) IS NULL) 
             AND (C.initials) = (P.initials)) 
             OR (C.forename) = (P.forename)) 
             AND ((C.sex) = (P.sex) 
             OR (C.title) = (P.title)) 
             AND (((C.address1))=((P.address1)) 
             OR ((C.address1))=((P.address2)) 
             OR ((C.address2))=((P.address1)) 
             OR instr(C.address1_notrim, P.address1_notrim) > 0 
             OR instr(P.address1_notrim, C.address1_notrim) > 0) 
             AND C.rowid < P.rowid); 

但是,使用此查询我无法识别与重复记录匹配的唯一记录ID。有没有一种方法来识别 重复以及唯一的记录ID(我的表具有唯一键)这些重复匹配?

回答

1
select id 
from promolog 
where surname, postcode, dob in (
    select surname, postcode,dob 
    from (
    select surname, postcode, dob, count(1) 
    from promolog 
    group by surname,postcode,dob 
    having count(1) > 1 
) 
) 
+0

嗨,谢谢你的回应。但我有一些其他规则来识别重复的内容,如:a。 b。DOB和 b。 \t邮编AND c。 \t姓氏和 d。 \t地址 i。 \t mailed_address1 = mailed_address1 ii。 \t或mailed_address1 = mailed_address2 iii。 \t或mailed_address2 = mailed_address1 iv。 \t或记录1的mailed_address1中的记录1的mailed_address1 v。\t或记录1的mailed_address1中记录2的mailed_address1 – subash 2013-02-21 14:22:16

+0

@subash只是在此查询中添加/更改需要比较的任何字段(我使用了原始帖子中的3个字段:姓氏,邮编,dob)。 – tbone 2013-02-21 14:32:50

+0

@subash:如果你更新你的问题,它会更好。你得到什么问题。 – 2013-02-21 14:33:29

1

您还可以使用分析函数做到这一点:

select id, num_of_ids, first_id, surname, postcode, dob 
from (
    select id, 
     count(*) over (partition by surname, postcode, dob) as num_of_ids, 
     first_value(id) 
      over (partition by surname, postcode, dob order by id) as first_id, 
     surname, 
     postcode, 
     dob 
    from promolog 
) 
where num_of_ids > 1; 

根据您的更新,我觉得你可以做一个自连接,它可以使你的那么复杂像:

select dup.*, master.id as duplicate_of 
from promolog dup 
join promolog master 
on master.surname = dup.surname 
and master.postcode = dup.postcode 
and master.dob = dup.dob 
... and <address checks etc. > ... 
and master.rowid < dup.rowid; 

但也许我仍然失去了一些东西。顾名思义,exists用于测试匹配记录的存在性;如果您想从匹配的记录中检索任何数据,那么您需要在某个时刻加入它。

相关问题