2010-02-01 99 views
2

我有这样删除重复在MySQL

userid visitorid time 
1  10   2009-12-23 
1  18   2009-12-06 
1  18   2009-12-14 
1  18   2009-12-18 
1705 1678  2010-01-24 
1705 1699  2010-01-24 
1705 1700  2010-01-24 
1712 1   2010-01-25 
1712 640   2010-01-24 
1712 925   2010-01-25 
1712 1600  2010-01-24 
1712 1630  2010-01-25 
1712 1630  2010-01-24 
1713 1   2010-01-24 
1713 1   2010-01-23 

一个表,我想执行一个查询,例如,它消除了除最新一期的所有副本。我希望你有一个想法?

例如,查询后的表必须是这样的

userid visitorid time 
1  10   2009-12-23 
1  18   2009-12-18 
1705 1678  2010-01-24 
1705 1699  2010-01-24 
1705 1700  2010-01-24 
1712 1   2010-01-25 
1712 640   2010-01-24 
1712 925   2010-01-25 
1712 1600  2010-01-24 
1712 1630  2010-01-25 
1713 1   2010-01-24 

回答

4
Delete from YourTable VersionA 
    where VersionA.Time NOT IN 
    (select MAX(VersionB.Time) Time 
     from YourTable VersionB 
     where VersionA.UserID = VersionB.UserID 
      and VersionA.VisitorID = VersionB.VisitorID) 

语法可能需要进行调整,但应该这样做。此外,您可能希望预先将Subselect查询到其自己的表FIRST中,然后针对该结果集运行DELETE FROM。

+0

#1064 - 您的SQL语法错误;检查与您的MySQL服务器版本相对应的手册,以找到正确的语法,以便在'VersionA版本中使用版本A.Time NOT IN(选择MAX(VersionB.Time)'行1 – 2010-02-01 13:02:40

+0

'Time'可以获得唯一的行忽略时间字段然后删除除这些以外的所有其他行并使用最大时间字段? – 2010-02-01 13:03:49

+0

“目前,您无法从表中删除并从子查询中的同一表中进行选择。”http://dev.mysql.com/doc/refman/5.0 /en/delete.html - 这是因为表没有正确锁定,没有人执行代码来正确锁定它。 – 2010-02-01 13:11:36

0

假设你的表称为Visitors

DELETE v1.* FROM Visitors v1 
LEFT JOIN (
    SELECT userid, visitorid, MAX(time) AS time 
    FROM Visitors v2 
    GROUP BY userid, visitorid 
) v3 ON v1.userid=v3.userid AND v1.visitorid=v3.visitorid AND v1.time = v3.time 
WHERE v3.userid IS NULL; 
0
DELETE mo.* 
FROM (
     SELECT userid, visitorid, MAX(time) AS mtime 
     FROM mytable 
     GROUP BY 
       userid, visitorid 
     ) mi 
JOIN mytable mo 
ON  mo.userid = mi.userid 
     AND mo.visitorid = mo.visitorid 
     AND mo.time < mi.mtime 
+0

谢谢,但这会删除除了一个之外的所有行...即为每个用户的最新行保留。 – 2010-02-01 13:00:15

0

您需要解决MySQL bug#6980,具有双重嵌套子查询:

DELETE FROM foo_table 
WHERE foo_table.time IN (
    SELECT time FROM (
     SELECT time FROM 
      foo_table 
      LEFT OUTER JOIN (
       SELECT MAX(time) AS time 
       FROM foo_table 
       GROUP BY userid, visitorid 
       ) AS foo_table_keep 
        USING (time) 
     WHERE 
      foo_table_keep.time IS NULL 
     ) AS foo_table_delete 
    ); 

使用GROUP BY崩溃重复到一个单一的行,MAX(time)选你想要的值。如果需要,请使用另一个聚合函数,而不是MAX

包装纸子查询两次,为每个提供别名,避免了错误:

ERROR 1093 (HY000): You can't specify target table 'foo_table' for update in FROM clause 

,并具有额外的优势在于它更清晰的语句是如何选择什么保持。