2012-10-22 44 views
8

分隔连续日期假设你有(从Postgres 9.1)像这样的表:GROUP BY由间隙

date | value 

它有它的一些差距(我的意思是:分(日之间并不是每一个可能的日期)和最大(日期)有它的行)。

我的问题是如何让每个一致的组(无间隙)单独处理汇总这些数据,就像这样:

min_date | max_date | [some aggregate of "value" column] 

任何想法怎么办呢?我相信这是可能的窗口功能,但经过一段时间尝试lag()lead()我有点卡住了。

例如,如果数据是这样的:

date   | value 
---------------+------- 
2011-10-31 | 2 
2011-11-01 | 8 
2011-11-02 | 10 
2012-09-13 | 1 
2012-09-14 | 4 
2012-09-15 | 5 
2012-09-16 | 20 
2012-10-30 | 10 

输出(用于sum作为总)将是:

min  | max  | sum 
-----------+------------+------- 
2011-10-31 | 2011-11-02 | 20 
2012-09-13 | 2012-09-16 | 30 
2012-10-30 | 2012-10-30 | 10 
+1

发布数据和期望的输出 –

+0

Clodoaldo,感谢您的关注。例如,如果数据是这样的: date \t | |值 --------------- + ------- 2011-10-31 | 2 2011-11-01 | 8 2011-11-02 | 10 2012-09-13 | 1 2012-09-14 | 4 2012-09-15 | 5 2012-09-16 | 20 2012-10-30 | 10 输出(对于“总和”作为聚合)将是: 分钟| max | sum ----------- + ------------ + ------- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | | 10 –

+0

您正在查找的单词是*连续*。看[这个答案](http://stackoverflow.com/a/8015107/398670)。 –

回答

8
create table t ("date" date, "value" int); 
insert into t ("date", "value") values 
    ('2011-10-31', 2), 
    ('2011-11-01', 8), 
    ('2011-11-02', 10), 
    ('2012-09-13', 1), 
    ('2012-09-14', 4), 
    ('2012-09-15', 5), 
    ('2012-09-16', 20), 
    ('2012-10-30', 10); 

简单和更便宜的版本:

select min("date"), max("date"), sum(value) 
from (
    select 
     "date", value, 
     "date" - (dense_rank() over(order by "date"))::int g 
    from t 
) s 
group by s.g 
order by 1 

我的第一次尝试是更加复杂和昂贵:

create temporary sequence s; 
select min("date"), max("date"), sum(value) 
from (
    select 
     "date", value, d, 
     case 
      when lag("date", 1, null) over(order by s.d) is null and "date" is not null 
       then nextval('s') 
      when lag("date", 1, null) over(order by s.d) is not null and "date" is not null 
       then lastval() 
      else 0 
     end g 
    from 
     t 
     right join 
     generate_series(
      (select min("date") from t)::date, 
      (select max("date") from t)::date + 1, 
      '1 day' 
     ) s(d) on s.d::date = t."date" 
) q 
where g != 0 
group by g 
order by 1 
; 
drop sequence s; 

输出:

min  | max  | sum 
------------+------------+----- 
2011-10-31 | 2011-11-02 | 20 
2012-09-13 | 2012-09-16 | 30 
2012-10-30 | 2012-10-30 | 10 
(3 rows) 
+0

在dense_rank()版本上+1。 –

0

这里是解决它的一个方式。

第一,得到一系列连续的开始,该查询会给你的第一次约会:

SELECT first.date 
FROM raw_data first 
    LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1 
WHERE prior_first IS NULL 

同样连续系列的结尾,

SELECT last.date 
FROM raw_data last 
    LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1 
WHERE after_last IS NULL 

你可能会考虑这些视图,以简化使用它们的查询。

我们只需要首先形成组范围

CREATE VIEW beginings AS 
SELECT first.date 
FROM raw_data first 
    LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1 
WHERE prior_first IS NULL 

CREATE VIEW endings AS 
SELECT last.date 
FROM raw_data last 
    LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1 
WHERE after_last IS NULL 

SELECT MIN(raw.date), MAX(raw.date), SUM(raw.value) 
FROM raw_data raw 
    INNER JOIN (SELECT lo.date AS lo_date, MIN(hi.date) as hi_date 
       FROM beginnings lo, endings hi 
       WHERE lo.date < hi.date 
       GROUP BY lo.date) range 
    ON raw.date >= range.lo_date AND raw.date <= range.hi_date 
GROUP BY range.lo_date