2012-12-18 111 views
2

重复条目我有一个客户一个表的名字:Customer_SCD在SQL 我出现在这3列:Customer_NameCustomer_IDCustomer_TimeStamp不同时间戳

有此表中的重复的条目和不同时间戳

例如

ABC, 1, 2012-12-05 11:58:20.370 

ABC, 1, 2012-12-03 12:11:09.840 

我想从数据库中消除这一点,并保持第一时间/日期可用。

谢谢。

+0

你是什么意思消除?你试过什么了? – PhearOfRayne

+0

一旦整理完毕,记得在表中添加一个唯一的约束(或者可能是多个约束),以便在6个月的时间内不必再次完成这项工作。 –

回答

2

这个方法奏效,试一下:

DELETE Customer_SCD 
OUTPUT deleted.* 
FROM Customer_SCD b 
JOIN (
    SELECT MIN(a.Customer_TimeStamp) Customer_TimeStamp, 
      Customer_ID, 
      Customer_Name 
    FROM Customer_SCD a 
    GROUP BY a.Customer_ID, a.Customer_Name 
) c ON 
    c.Customer_ID = b.Customer_ID 
AND c.Customer_Name = b.Customer_Name 
AND c.Customer_TimeStamp <> b.Customer_TimeStamp 

在它决定哪个记录是第一个为每Customer_NameCustomer_ID子查询,然后将其删除所有其他记录的副本。我还添加了OUTPUT子句,它返回受该语句影响的行。

您还可以通过使用排序功能ROW_NUMBER做到这一点:

DELETE Customer_SCD 
OUTPUT deleted.* 
FROM Customer_SCD b 
JOIN (
    SELECT Customer_ID, 
      Customer_Name, 
      Customer_TimeStamp, 
      ROW_NUMBER() OVER (PARTITION BY Customer_ID, Customer_Name ORDER BY Customer_TimeStamp) num 
    FROM Customer_SCD 
) c ON 
    c.Customer_ID = b.Customer_ID 
AND c.Customer_Name = b.Customer_Name 
AND c.Customer_TimeStamp = b.Customer_TimeStamp 
AND c.num <> 1 

,看看哪一个具有较小的查询成本和使用它,当我检查了,第一种方法是更有效的(它有一个更好的执行计划)。

下面是一个SQL Fiddle

0

下面的查询会给你你想保留的结果。

Select Customer_Name, Customer_ID, MIN(Customer_TimeStamp) as Customer_TimeStamp 
from Customer_SCD 
group by Customer_Name, Customer_ID 

。结果存储在一个表变量,说@correctTbl

然后用这个表连接并删除重复

delete 
from Customer_SCD a 
inner join @correctTbl b on a.Customer_Name = b.Customer_Name and a.Customer_ID = b.Customer_ID and a.Customer_TimeStamp != b.Customer_TimeStamp