2017-07-19 51 views
0

给定一个连续运行数据的表:一个数字在任务进行时总是增加,并在下一个任务开始时重置为零,如何选择最大值每次运行的数据?连续运行数据的SQL选择最大值

每个连续运行可以有任意数量的行和数据的运行由AA“开始”和“结束”行标,例如数据可能看起来像

user_id, action, qty, datetime 
1,  start, 0, 2017-01-01 00:00:01 
1,  record, 0, 2017-01-01 00:00:01 
1,  record, 4, 2017-01-01 00:00:02 
1,  record, 5, 2017-01-01 00:00:03 
1,  record, 6, 2017-01-01 00:00:04 
1,  end, 0, 2017-01-01 00:00:04 
1,  start, 0, 2017-01-01 00:00:05 
1,  record, 0, 2017-01-01 00:00:05 
1,  record, 2, 2017-01-01 00:00:06 
1,  record, 3, 2017-01-01 00:00:07 
1,  end, 0, 2017-01-01 00:00:07 
2,  start, 0, 2017-01-01 00:00:08 
2,  record, 0, 2017-01-01 00:00:08 
2,  record, 3, 2017-01-01 00:00:09 
2,  record, 8, 2017-01-01 00:00:10 
2,  end, 0, 2017-01-01 00:00:10 

,其结果将是每次运行的最大值:

user_id, action, qty, datetime 
1,  record, 6, 2017-01-01 00:00:04 
1,  record, 3, 2017-01-01 00:00:07 
2,  record, 8, 2017-01-01 00:00:10  

使用任何postgres sql语法(9.3)?它的某种分组,然后从每个组中选择最大值,但我不知道如何执行分组部分。

+0

对于同一个user_id,你能有2个重叠运行(例如来自不同的会话)吗? –

+0

Theres没有为单个用户重叠,下一次运行始终在晚些时候开始。 –

回答

1

快速和肮脏的,假设运行不重叠

with bounds as (select starts.rn, starts.datetime as s, ends.datetime as e from 
(select datetime,ROW_NUMBER() OVER() as rn from runs where action = 'start' order by datetime) as starts 
    join 
(select datetime,ROW_NUMBER() OVER() as rn from runs where action = 'end' order by datetime) as ends 
on starts.rn = ends.rn) 
,with_run as (SELECT *, (select rn from bounds where s <= r.datetime and e >= r.datetime) as run 
    from runs as r) 
,max_qty as (
SELECT run,max(qty) as qty 
    from with_run 
GROUP BY run) 
SELECT s.user_id,s.action,s.qty,s.datetime from with_run as s join max_qty as f on s.run = f.run AND s.qty = f.qty; 

- 试验数据 -

create table runs (user_id int, action text, qty int, datetime TIMESTAMP); 
insert INTO runs VALUES 
(1,  'start', 0, '2017-01-01 00:00:01') 
,(1,  'record', 0, '2017-01-01 00:00:01') 
,(1,  'record', 4, '2017-01-01 00:00:02') 
,(1,  'record', 5, '2017-01-01 00:00:03') 
,(1,  'record', 6, '2017-01-01 00:00:04') 
,(1,  'end', 0, '2017-01-01 00:00:04') 
,(1,  'start', 0, '2017-01-01 00:00:05') 
,(1,  'record', 0, '2017-01-01 00:00:05') 
,(1,  'record', 2, '2017-01-01 00:00:06') 
,(1,  'record', 3, '2017-01-01 00:00:07') 
,(1,  'end', 0, '2017-01-01 00:00:07') 
,(2,  'start', 0, '2017-01-01 00:00:08') 
,(2,  'record', 0, '2017-01-01 00:00:08') 
,(2,  'record', 3, '2017-01-01 00:00:09') 
,(2,  'record', 8, '2017-01-01 00:00:10') 
,(2,  'end', 0, '2017-01-01 00:00:10'); 

UPDATE @Oto Shavadze答案可以缩短

with lookup as (select action,lag(t.*) over(order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end) as r from runs t) 
select (r::runs).user_id 
     ,(r::runs).action 
     ,(r::runs).qty 
     ,(r::runs).datetime 
from lookup where action = 'end'; 

我认为OP不清楚什么是最大的考虑在结束前的记录或运行中的最高数量。

3

如果单个用户没有重叠,下一次运行总是从晚些时候开始,那么您可以使用LAG()窗口函数。

with the_table(user_id, action, qty, datetime) as (
    select 1,'start', 0, '2017-01-01 00:00:01'::timestamp union all 
    select 1,'record', 0, '2017-01-01 00:00:01'::timestamp union all 
    select 1,'record', 4, '2017-01-01 00:00:02'::timestamp union all 
    select 1,'record', 5, '2017-01-01 00:00:03'::timestamp union all 
    select 1,'record', 6, '2017-01-01 00:00:04'::timestamp union all 
    select 1,'end', 0, '2017-01-01 00:00:04'::timestamp union all 
    select 1,'start', 0, '2017-01-01 00:00:05'::timestamp union all 
    select 1,'record', 0, '2017-01-01 00:00:05'::timestamp union all 
    select 1,'record', 2, '2017-01-01 00:00:06'::timestamp union all 
    select 1,'record', 3, '2017-01-01 00:00:07'::timestamp union all 
    select 1,'end', 0, '2017-01-01 00:00:07'::timestamp union all 
    select 2,'start', 0, '2017-01-01 00:00:08'::timestamp union all 
    select 2,'record', 0, '2017-01-01 00:00:08'::timestamp union all 
    select 2,'record', 3, '2017-01-01 00:00:09'::timestamp union all 
    select 2,'record', 8, '2017-01-01 00:00:10'::timestamp union all 
    select 2,'end', 0, '2017-01-01 00:00:10'::timestamp 
) 

select n_user_id, n_action, n_qty, n_datetime from (
    select action, 
    lag(user_id) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_user_id, 
    lag(action) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_action, 
    lag(qty) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_qty, 
    lag(datetime) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_datetime 
    from the_table 
)t 
where action = 'end' 

因为有些action = record行具有相同的日期时间为startend行,我在ORDER BY使用CASE,很明显的是start是第一,然后是record,然后end