2012-07-10 47 views
1

我一直在处理查询以搜索和删除重复的列值。目前我有此查询(返回式两份):从重复选择删除较旧

SELECT NUIP, FECHA_REGISTRO 
FROM registros_civiles_nacimiento 
WHERE NUIP IN (
SELECT NUIP 
FROM registros_civiles_nacimiento 
GROUP BY NUIP 
HAVING (COUNT(NUIP) > 1) 
) order by NUIP 

这项工作返回一个表是这样的:

NUIP  FECHA_REGISTRO 
38120100138 1975-05-30 
38120100138 1977-08-31 
40051800275 1980-09-24 
40051800275 1999-11-29 
42110700118 1972-10-26 
42110700118 1982-04-22 
44030700535 1982-10-19 
44030700535 1993-05-05 
46072300777 1991-01-17 
46072300777 1979-03-30 

的事情是,我需要删除重复的列值的行。但我需要删除与最早的日期的行,例如,对于给定的结果,一旦进行必要的查询,这是一个必须要保持结果列表:

NUIP  FECHA_REGISTRO 
38120100138 1977-08-31 
40051800275 1999-11-29 
42110700118 1982-04-22 
44030700535 1993-05-05 
46072300777 1991-01-17 

我怎样才能用普通的SQL来做到这一点?

+0

你是不是想从数据库中删除的行或只是从查询中删除它们 – 2012-07-10 14:24:25

+0

@GordonLinoff从数据库中删除 – 2012-07-10 14:32:07

回答

1
--PULL YOUR SELECT OF RECS WITH DUPES INTO A TEMP TABLE 
--(OR CREATE A NEW TABLE SO THAT YOU CAN KEEP THEM AROUND FOR LATER IN CASE) 
SELECT NUIP,FECHA_REGISTRO 
INTO #NUIP 
FROM  SO_NUIP 
WHERE NUIP IN (
SELECT NUIP 
FROM SO_NUIP 
GROUP BY NUIP 
HAVING (COUNT(NUIP) > 1) 
) 

--CREATE FLAG FOR DETERMINIG DUPES 
ALTER TABLE #NUIP ADD DUPLICATETOREMOVE bit 

--USE `RANK()` TO SET FLAG 
UPDATE #NUIP 
SET DUPLICATETOREMOVE = CASE X.RANK 
     WHEN 1 THEN 1 
     ELSE 0 
     END 
--SELECT * 
FROM #NUIP A 
INNER JOIN (SELECT NUIP,FECHA_REGISTRO,RANK() OVER (PARTITION BY [NUIP] ORDER BY FECHA_REGISTRO ASC) AS RANK 
FROM #NUIP) X ON X.NUIP = A.NUIP AND X.FECHA_REGISTRO = A.FECHA_REGISTRO 

--HERE IS YOUR DELETE LIST 
SELECT * 
FROM so_registros_civiles_nacimiento R 
JOIN #NUIP N ON N.NUIP = R.NUIP AND N.FECHA_REGISTRO = R.FECHA_REGISTRO 
WHERE N.DUPLICATETOREMOVE = 1 

--HERE IS YOUR KEEP LIST 
SELECT * 
FROM so_registros_civiles_nacimiento R 
JOIN #NUIP N ON N.NUIP = R.NUIP AND N.FECHA_REGISTRO = R.FECHA_REGISTRO 
WHERE N.DUPLICATETOREMOVE = 0 

--ZAP THEM AND COMMIT YOUR TRANSACTION, YOU'VE STILL GOT A REC OF THE DELETEDS FOR AS LONG AS THE SCOPE OF YOUR #NUIP 
BEGIN TRAN --COMMIT --ROLLBACK 
DELETE FROM so_registros_civiles_nacimiento 
JOIN #NUIP N ON N.NUIP = R.NUIP AND N.FECHA_REGISTRO = R.FECHA_REGISTRO 
WHERE N.DUPLICATETOREMOVE = 1 
+0

真棒,这工作完美。 – 2012-07-10 20:58:01

+0

谢谢,很高兴它! – user1166147 2012-07-11 01:21:27

0

您可以使用分析功能是:

;WITH CTE AS 
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY NUIP ORDER BY FECHA_REGISTRO DESC) RN 
    FROM registros_civiles_nacimiento 
) 
DELETE FROM CTE 
WHERE RN > 1; 
+0

?查询已执行约10分钟,尚未完成。分析功能需要很长时间才能执行(Xeon 3.2,4GB RAM,120.000 db行)。 – 2012-07-10 14:48:22

0
  1. 使用RANK()创建日期排列的结果集
  2. 使用WHERE EXISTS从源头上删除。

(注:如果您运行在你的重复RANK函数,你应该把你的结果我刚刚提到的整个见下表)

此语句工作在甲骨文(更换SELECT *与删除它是否适合你:

SELECT * 
FROM registros_civiles_nacimiento ALL_ 
WHERE EXISTS 
    (SELECT * FROM  
     (SELECT * FROM 
      (SELECT NUIP, 
        FECHA_REGISTRO, 
        RANK() OVER (PARTITION BY NUIP ORDER BY FECHA_REGISTRO) AS ORDER_ 
      FROM registros_civiles_nacimiento) 
     WHERE ORDER_ = 1) OLDEST 
    WHERE ALL_.NUIP = OLDEST.NUIP 
    AND ALL_.FECHA_REGISTRO = OLDEST.FECHA_REGISTRO); 
+0

正如我在我的问题中所述。服务器是Microsoft SQL Server。 – 2012-07-10 14:54:52

+0

@RandolfRincón-Fadul是的,我确实看到了,但是SQL Server中的rank函数是可用的,所以这个模板对你来说应该仍然是一个有效的方法。 – PaddyC 2012-07-10 15:07:24