2013-08-19 67 views
1

干杯, 工作Postgres的表行聚合的基于距离

CREATE TABLE my_table (
    "id" serial, 
    "sensorid" integer, 
    "actorid" integer, 
    "timestamp" timestamp without time zone, 
) 

与例如数据

id, sensorid, actorid, timestamp 
1; 2267; 3023; "2013-07-09 12:20:06.446" 
2; 2267; 3023; "2013-07-09 12:20:16.421" 
3; 2267; 3023; "2013-07-09 12:20:30.661" 
4; 2267; 3023; "2013-07-09 12:20:36.958" 
5; 2267; 3023; "2013-07-09 12:20:49.508" 
6; 2267; 3023; "2013-07-09 12:20:57.683" 
7; 3301; 3023; "2013-08-15 06:03:03.428" 
8; 2267; 3024; "2013-07-09 12:19:52.196" 
9; 2267; 3024; "2013-07-09 12:20:16.515" 
10; 2267; 3024; "2013-07-09 12:20:42.341" 
11; 2267; 3025; "2013-07-09 12:21:05.98" 
12; 2268; 3026; "2013-07-09 12:22:35.03" 
13; 2268; 3026; "2013-07-09 12:22:45.066" 
14; 3192; 3026; "2013-08-09 07:41:31.206" 

欲组用以下标准中的记录

  1. 他们有相同的sensorid
  2. 他们有th e相同的演员
  3. (问题:)他们之间的时间跨度小于(说)5分钟。也就是说,可能有一个小组跨越了一个多小时,但组中两条记录之间的间隔不超过5分钟。时间跨度可以是平均的聚合。
  4. 此外,必须给出每个组的聚合记录的数量,因为必须标识太大的组。

因此,输出应该是这个样子

id; sensorid, actorid, avg, count 
1; 2267; 3023; "2013-07-09 12:20:30.000"; 7; 
2; 3301; 3023; "2013-08-15 06:03:03.428"; 1; 
3; 2267; 3024; "2013-07-09 12:20:06.415"; 3; 
5; 2267; 3025; "2013-07-09 12:21:05.98"; 1; 
6; 2268; 3026; "2013-07-09 12:22:40.626"; 2; 
7; 3192; 3026; "2013-08-09 07:41:31.206"; 1; 

感谢您的帮助! Dennis

+0

你应该接受戈登 - linoff答案。 –

回答

2

首先,您要使用lag()来确定上一次是否以及是否启动新的期间。然后,对于每个感官/演员ID组合,您可以累积总和isStart来识别每对的组。

然后做汇总,包括结果这个新团体:

select sensorid, actorid, min(timestamp), max(timestamp), count(*) as numInGroup 
from (select t.*, 
      sum(isStart) over (partition by sensorid, actorid order by timestamp) as grp 
     from (select t.*, 
        (case when prevts is null or prevts < timestamp - interval '5 minutes' 
         then 1 else 0 
        end) as isStart 
      from (select t.*, 
         lag(timestamp) over (partition by sensorid, actorid 
               order by timestamp) as prevts 
        from my_table t 
       ) t 
      ) t 
    ) t 
group by sensorid, actorid, grp 
+0

嗨戈登,我没有充分测试它,但现在看起来很完美(除了包括滞后函数调用[timestmp!= timestamp])在内的一个小错字。显然,我得到的结果比我预期的还要好。额外的时间段为进一步分析提供了很好的机会。感谢您的回复! – Ronk