2010-06-24 187 views
4

过滤不良数据使用SQL Server 2005在SQL Server 2005

我有以下的列

ID 名 日期 值的表

我想选择的所有行表中按日期不连续四个零。我会怎么做?下面是我的意思的一个例子。

id  name  date   value 
1  a  1/1/2010  5 
2  a  1/2/2010  3 
3  a  1/3/2010  5 
4  a  1/4/2010  0 
5  a  1/7/2010  0 
6  a  1/8/2010  0 
7  a  1/9/2010  2 
8  a  1/10/2010 3 
9  a  1/11/2010 0 
10  a  1/15/2010 0 
11  a  1/16/2010 0 
12  a  1/17/2010 0 
13  a  1/20/2010 4 
14  a  1/21/2010 4 

我想查询的结果包括除ID 9-12以外的所有行。

+0

有趣的是,只是想知道这需要..这是一个商业规则或只是学习过程? – VoodooChild 2010-06-24 17:10:25

+0

这是业务需求。我们需要从整体计算中消除不良数据点。还有那个爵士乐。 – 2010-06-24 17:12:20

回答

2

这是假设您按ID排序的行,但您可以简单地将ORDER BY id更改为别的,它应该仍然有效。

使用在this Kodyaz Development Resources site上找到的T-SQL CTE,我能够创建下面的代码。我有它的工作,所以它删除行有两个连续的0,而不是4,因为我在我的代码上测试它,只是改变了表/行的名称。

WITH CTE as (
    SELECT 
    RN = ROW_NUMBER() OVER (ORDER BY id), 
    * 
    FROM tablename 
) 
SELECT 
    [Current Row].* 
FROM CTE [Current Row] 
LEFT JOIN CTE [Previous Row] ON 
    [Previous Row].RN = [Current Row].RN - 1 
LEFT JOIN CTE [Next Row] ON 
    [Next Row].RN = [Current Row].RN + 1 
WHERE 
    not([Current Row].value = 0 AND [Next Row].value = 0) AND 
    // this deletes the row where value is zero and the next rows value is zero 
    not([Previous Row].value = 0 AND [Current Row].value = 0) 
    // this deletes the row where value is zero and the previous rows value is zero 

所有你需要做的,使之成为你的情况是把WHERE语句中每一个可能的组合工作。例如,处理这一行和接下来的三行等于0或者这一行是前一行和后两行。

+0

使用ROW_NUMBER保证以简单的方式查找下一行的能力的绝佳主意+1 – 2010-06-24 17:33:37

1

你没有提及这个名字是如何涉及的,所以我假设你想按名称完成。我将进一步假设,当你谈论“连续”时,你的意思是按照日期顺序,而不是以id顺序。最后,我还要假定你也将排除在连续5个零,连续6个零,等

有可能是一个更简单的方法,但这应该工作:

;WITH Transitions_To_CTE AS 
(
    SELECT 
     T1.id, 
     T1.name, 
     T1.date, 
     T1.value 
    FROM 
     My_Table T1 
    LEFT OUTER JOIN My_Table T2 ON 
     T2.name = T1.name AND 
     T2.date < T1.date AND 
     T2.value <> 0 
    LEFT OUTER JOIN My_Table T3 ON 
     T3.name = T1.name AND 
     T3.date > COALESCE(T2.date, '1900-01-01') AND 
     T3.date < T1.date 
    WHERE 
     T1.value = 0 AND 
     T3.id IS NULL 
), 
Transitions_From_CTE AS 
(
    SELECT 
     T1.id, 
     T1.name, 
     T1.date, 
     T1.value 
    FROM 
     My_Table T1 
    LEFT OUTER JOIN My_Table T2 ON 
     T2.name = T1.name AND 
     T2.date > T1.date AND 
     T2.value <> 0 
    LEFT OUTER JOIN My_Table T3 ON 
     T3.name = T1.name AND 
     T3.date < COALESCE(T2.date, '9999-12-31') AND 
     T3.date > T1.date 
    WHERE 
     T1.value = 0 AND 
     T3.id IS NULL 
), 
Range_Exclusions AS 
(
    SELECT 
     S.name, 
     S.date AS start_date, 
     E.date AS end_date 
    FROM 
     Transitions_To_CTE S 
    INNER JOIN Transitions_From_CTE E ON 
     E.name = S.name AND 
     E.date > S.date 
    LEFT OUTER JOIN Transitions_From_CTE E2 ON 
     E2.name = S.name AND 
     E2.date > S.date AND 
     E2.date < E.date 
    WHERE 
     E2.id IS NULL AND 
     (SELECT COUNT(*) FROM dbo.My_Table T WHERE T.name = S.name AND T.date BETWEEN S.date AND E.date) >= 4 
) 
SELECT 
    T.id, 
    T.name, 
    T.date, 
    T.value 
FROM 
    dbo.My_Table T 
WHERE 
    NOT EXISTS (SELECT * FROM Range_Exclusions RE WHERE RE.name = T.name AND T.date BETWEEN RE.start_date AND RE.end_date) 
+0

谢谢。 +1为您的答案和几乎击败我。 – Kyra 2010-06-24 17:36:54

0

这里是我的尝试,使用递归cte计算出连续的零的数量,然后使用级别> 4创建一个ID序列,然后简单地在id上做一个not in子句。

with trend --work out number of consecutive zeros using level 
as 
(Select 1 as level, id, value, id as startid 
    from IdsAndValues 
    Union All 
    Select [Level]+1, P.ID, p.value, t.startid 
    From IdsAndValues as p 
     Inner Join trend as t on p.id = t.id+1 
    Where t.value =0 and p.value=0 
) 
,IDs --create sequence of ids using startid and id, this allows us to do the not in 
as 
( 
    Select startid as ExcludeID ,id 
    from trend as t-- 
    Where level>=4 
    Union All 
    Select ExcludeID +1, id 
    From ids 
    where ExcludeID <id 
) 

Select * 
from IdsAndValues 
Where id Not in 
    (Select ExcludeID from IDs)