2010-01-29 134 views
4

SQL中是否有可能删除(只有一个)组合列的重复条目(这里是:city,zip)?所以,如果我有这个SQL:通过SQL删除重复条目?

INSERT INTO foo (id, city, zip) VALUES (1, 'New York', '00000') 
INSERT INTO foo (id, city, zip) VALUES (2, 'New York', '00000') 

我可以删除第一个以后的SQL语句?我的方法不适用于此

DELETE FROM foo (id, city, zip) 
     WHERE id IN 
      (SELECT id FROM foo GROUP BY id HAVING (COUNT(zip) > 1)) 
+2

只删除一个或只留下一个?这一点很重要,只要你有3个匹配的项目。 – Lucero 2010-01-29 11:43:48

+0

只有一个。 – codevour 2010-01-29 11:52:18

回答

6

改编自this article。这两个解决方案是通用的,并且应该在任何合理的SQL实现上工作。

就地删除重复:

DELETE T1 
FROM foo T1, foo T2 
WHERE (T1.city = T2.city AND foo1.zip=foo2.zip) -- Duplicate rows 
    AND T1.id > T2.id;       -- Delete the one with higher id 

简单,并应做工精细的小表或表很少重复。

重复的记录复制到另一个表:

CREATE TABLE foo_temp LIKE(foo); 
INSERT INTO foo_temp (SELECT distinct city, zip) FORM foo; 
TRUNCATE TABLE foo; 

如果你足够幸运,有一个作为你的ID,简单地说:

INSERT INTO foo SELECT * FROM foo_temp; 
DROP TABLE foo_temp; 

有点复杂,但非常有效的有很多重复的非常大的桌子。对于这些,创建(城市,邮编)索引将令人难以置信地提高查询性能。

+1

“正在进行中” - 我将不得不记住在编辑时将来也会这样做;;) – Lucero 2010-01-29 11:47:27

+0

是的。我弹出了一般想法,防止其他人浪费他们的时间用相同的想法参加比赛。 – 2010-01-29 11:59:20

1

由于不同的方言有不同的特征,因此您的案例中支持的SQL不清楚。是什么使我想起在内部查询,而不是HAVING使用排名上zip,只包括那些有秩> 1

+0

SQL98将是最好的 – codevour 2010-01-29 11:53:43

2

SQL Server 2005和更高:

WITH q AS 
     (
     SELECT *, 
       ROW_NUMBER() OVER (PARTITION BY city, zip ORDER BY id) AS rn, 
       COUNT(*) OVER (PARTITION BY city, zip ORDER BY id) AS cnt 
     FROM mytable 
     ) 
DELETE 
FROM q 
WHERE rn = 1 
     AND cnt > 1 

删除的第一行(具有一式两份),

WITH q AS 
     (
     SELECT *, ROW_NUMBER() OVER (PARTITION BY city, zip ORDER BY id) AS rn 
     FROM mytable 
     ) 
DELETE 
FROM q 
WHERE rn = 2 

删除第一个重复,

WITH q AS 
     (
     SELECT *, ROW_NUMBER() OVER (PARTITION BY city, zip ORDER BY id) AS rn 
     FROM mytable 
     ) 
DELETE 
FROM q 
WHERE rn > 1 

删除所有重复项。

+0

+1 - 我的意思是我的意见,但我不够流利,只是写下来。 – Lucero 2010-01-29 11:53:42

1
DELETE FROM 
    cities 
WHERE 
    id 
NOT IN 
(
    SELECT id FROM 
    (
     -- Get the maximum id of any zip/city combination 
      -- This will work with both duped and non-duped rows 
     SELECT 
      MAX(id), 
      city, 
      zip 
     FROM 
      cities 
     GROUP BY 
      city, 
      zip 
    ) ids_only 
) 
0

接受的答案没有在我的oracle数据库上工作。 该做的:

DELETE FROM 
    mytable A 
WHERE 
    A.rowid > 
    ANY (
    SELECT 
     B.rowid 
    FROM 
     mytable B 
    WHERE 
     A.col1 = B.col1 
    AND 
     A.col2 = B.col2 
     ); 

(也适用于任何列,而不是ROWID)

找到here