2016-09-23 83 views
3

我在一个表中的下列数据:SQL计数连续行

|event_id |starttime  |person_id|attended| 
|------------|-----------------|---------|--------| 
| 11512997-1 | 01-SEP-16 08:00 | 10001 | N  | 
| 11512997-2 | 01-SEP-16 10:00 | 10001 | N  | 
| 11512997-3 | 01-SEP-16 12:00 | 10001 | N  | 
| 11512997-4 | 01-SEP-16 14:00 | 10001 | N  | 
| 11512997-5 | 01-SEP-16 16:00 | 10001 | N  | 
| 11512997-6 | 01-SEP-16 18:00 | 10001 | Y  | 
| 11512997-7 | 02-SEP-16 08:00 | 10001 | N  | 
| 11512997-1 | 01-SEP-16 08:00 | 10002 | N  | 
| 11512997-2 | 01-SEP-16 10:00 | 10002 | N  | 
| 11512997-3 | 01-SEP-16 12:00 | 10002 | N  | 
| 11512997-4 | 01-SEP-16 14:00 | 10002 | Y  | 
| 11512997-5 | 01-SEP-16 16:00 | 10002 | N  | 
| 11512997-6 | 01-SEP-16 18:00 | 10002 | Y  | 
| 11512997-7 | 02-SEP-16 08:00 | 10002 | Y  | 

欲产生以下结果,其中连续出现次数的最大数目,其中atended =“N”返回:

|person_id|consec_missed_max| 
| 1001 | 5    | 
| 1002 | 3    | 

这怎么能在Oracle(或ANSI)SQL中完成?谢谢!

编辑:

到目前为止,我曾尝试:

WITH t1 AS 
(SELECT t.person_id, 
    row_number() over(PARTITION BY t.person_id ORDER BY t.starttime) AS idx 
    FROM the_table t 
    WHERE t.attended = 'N'), 
t2 AS 
(SELECT person_id, MAX(idx) max_idx FROM t1 GROUP BY person_id) 
SELECT t1.person_id, COUNT(1) ct 
    FROM t1 
    JOIN t2 
    ON t1.person_id = t2.person_id 
GROUP BY t1.person_id; 
+0

只是添加了什么我至今尝试过,当谈到使用分析功能我仍然不完全确定如何去做。 – ubersnack

回答

6

的主要工作是在保理子查询 “准备”。你似乎对分析功能有点熟悉,但这还不够。该解决方案使用所谓的“tabibitosan”方法在一个或多个维度上创建具有相同特征的连续行组;在这种情况下,您希望将每个序列的连续N行与不同的组合在一起。这是通过两次ROW_NUMBER()调用的区别完成的 - 一次只能由人员进行分区,另一个人员进行分区并出席。谷歌“tabibitosan”如果需要阅读更多关于这个想法。

with 
    inputs (event_id, starttime, person_id, attended) as (
     select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
     select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all  
     select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
     select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
     select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
     select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10001, 'Y' from dual union all 
     select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
     select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all 
     select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all 
     select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all 
     select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all 
     select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all 
     select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all 
     select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual 
    ), 
     prep (starttime, person_id, attended, gp) as (
     select starttime, person_id, attended, 
       row_number() over (partition by person_id order by starttime) - 
        row_number() over (partition by person_id, attended 
             order by starttime) 
     from inputs 
    ), 
     counts (person_id, consecutive_absences) as (
     select person_id, count(*) 
     from prep 
     where attended = 'N' 
     group by person_id, gp 
    ) 
select person_id, max(consecutive_absences) as max_consecutive_absences 
from counts 
group by person_id 
order by person_id; 

OUTPUT:

PERSON_ID    MAX_CONSECUTIVE_ABSENCES 
---------- --------------------------------------- 
    10001          5 
    10002          3 
+0

完美工作,谢谢! – ubersnack

0

如果您正在使用Oracle 12c你可以使用MATCH_RECOGNIZE

数据:

CREATE TABLE data AS 
SELECT * 
FROM (
with inputs (event_id, starttime, person_id, attended) as (
    select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
    select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all  
    select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
    select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
    select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
    select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10001, 'Y' from dual union all 
    select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all 
    select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all 
    select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all 
    select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all 
    select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all 
    select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all 
    select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all 
    select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual 
    ) 
SELECT * FROM inputs 
); 

和查询:

SELECT PERSON_ID, MAX(LEN) AS MAX_ABSENCES_IN_ROW 
FROM data 
MATCH_RECOGNIZE (
    PARTITION BY PERSON_ID 
    ORDER BY STARTTIME 
    MEASURES FINAL COUNT(*) AS len 
    ALL ROWS PER MATCH 
    PATTERN(a b*) 
    DEFINE b AS attended = a.attended 
) 
WHERE attended = 'N' 
GROUP BY PERSON_ID; 

输出:

"PERSON_ID","MAX_ABSENCES_IN_ROW" 
10001,5 
10002,3 

编辑:

由于@mathguy指出它可以被改写为:

SELECT PERSON_ID, MAX(LEN) AS MAX_ABSENCES_IN_ROW 
FROM data 
MATCH_RECOGNIZE (
    PARTITION BY PERSON_ID 
    ORDER BY STARTTIME 
    MEASURES COUNT(*) AS len 
    PATTERN(a+) 
    DEFINE a AS attended = 'N' 
) 
GROUP BY PERSON_ID; 
+0

太复杂。你不需要'每个匹配的所有行'。每场比赛只需返回该比赛的“COUNT”。那么,应该没有'WHERE'子句。相反,'PATTERN'应该是'a +','DEFINE'子句应该是'DEFINE a AS attend'='N''。这将是一个更有效的解决方案(如比较计划所示)。 – mathguy

+1

我编辑删除'COUNT(*)'前面的单词'FINAL'。当你返回'每个匹配的所有行'时,'count()'的行为就像一个分析函数,除非你用'final'限定它。但是当你每场比赛(默认)返​​回**一个**行时,没有“跑步”和“最终”。当你使用'final(something)'''每行一行'时,Oracle不会抛出语法错误。它不会,但“最后”在那里不合适。 – mathguy