2013-01-07 234 views
2

如何查询数据中的时间片,当时间片的时间片大于所需的时间片。最终结果将用于绘制堆积的条形图。如何查询DB时间片大于所需时间片的时间序列数据?

实施例的数据:使用的时间片100 “单元”

START_TS (int)| END_TS (int) | DATA (int) | GROUP 
----------------------------------- 
0  | 179  | 2000 | G1 
180  | 499  | 1000 | G2 
500  | 699  | 1000 | G1 
845 ... 

求购输出。输出中不需要End_ts,但有助于理解计算。

START_TS | END_TS | DATA (equation = amount in that time slice) | GROUP 
------------------------------------------------------- 
0  | 99 | (2000/180) * 100 = 1111 | G1 
100  | 199 | (2000/180) * 80 = 889 | G1 
100  | 199 | (1000/320) * 20 = 63 | G2 
200  | 299 | (1000/320) * 100 = 313 | G2 
300  | 399 | (1000/320) * 100 = 313 | G2 
400  | 499 | (1000/320) * 100 = 313 | G2 

从这里得到时间序列是这样的。

SELECT (startts/100)*100, ... 
FROM TABLE 
    FULL JOIN 
     (SELECT startts from generate_series(0,700,100) startts) s1 
    USING (startts) 
GROUP BY startts/100 

因此,这将是这样的(没有GROUP BY)

STARTTS | ENDTS | DATA | GROUP 
    0  | 179  | 2000 | G1 
    100 |  
    180 | 499  | 1000 | G2 
    200 | 
    300 | 
    400 | 
    500 | 699  | 1000 | G1 
    600 | 
    700 

但我怎么分割中的数据具有两个或多个生成的行(时间片行),在计算时间切片。


**这基本上起作用,但对大数据集没有真正的功能。行像1-100M行。

下面是该查询做一些+更多的不重叠的时间片

SELECT (start_ts/100)*100 as start_ts, sum(part) as data, cgroup 
FROM (
SELECT *, (data * (overlap_end-overlap_start + 1)/(end_ts - tts + 1)) as part 
FROM 
    (
    SELECT (case when s1.start_ts > t.start_ts then s1.start_ts else t.start_ts end) as overlap_start, 
     (case when s1.start_ts+100 < t.end_ts then s1.start_ts+100-1 else t.end_ts end) as overlap_end, 
     t.start_ts as tts, s1.start_ts as start_ts, t.end_ts, cgroup, data 
    FROM (SELECT start_ts from generate_series(0,800,100) start_ts) s1 
     LEFT OUTER JOIN test t on t.start_ts < s1.start_ts+100 and t.end_ts >= s1.start_ts 
    ) t 
) t2 
GROUP BY start_ts/100, cgroup 
+0

你有一个 '重复' 所需的行('START_TS = 100,END_TS = 199') - 你想这与其他部分进行汇总?另外,你知道你所做的任何分割都将完全捏造/平均,对吗?因为在他们最初发生的时间片中你不知道_when_;这就像一个游客想知道为什么指导手册说'带上外套',当年的平均温度是90°F - 这只是一年中的40°F的一天。通常最好从原始数据构建这种东西 - 它是否可用? –

+0

是的,我想在“100”片段中有两个start_ts值,因为它们将显示该片段中的每个组值。我知道它会制作/平均结果,但这是现在想要的功能。我正在绘制堆叠条或实际堆叠的线条图,其中每条线都是1像素宽,并与该切片中的所有组进行堆叠。原始数据可能会在周围,但只有在达到某个缩放级别后才能使用,并且不在此问题中。 –

回答

1

你需要的是分割不同的时隙到箱中,由序列定义的聚合值。以下查询执行此通过修改连接条件,并计算这两者之间的重叠:

SELECT (startts/100)*100, ... 
from (select (case when s1.starts > t.start_ts then s1.starts else t.start_t2 end) as overlap_start, 
      (case when s1.starts+100 < t.end_ts then s1.starts+100-1 else t.end_ts end) as overlap_end, 
      ts.* 
     FROM (SELECT startts from generate_series(0,700,100) startts) s1 left outer join 
      TABLE t 
      on t.startts < s1.starts+100 and 
       t.end_ts >= s1.starts 
    ) t 
+0

**绝对惊人的**,这是它,谢谢。即使艰难,我也可以阅读和理解查询,但它不知如何不适合我的头,因此我可以组成一个。 SQL在某种程度上如此神秘。 –

+0

重叠时间段很难形象化。 –

+0

真实数据似乎有一个问题。数据集很大,1-100M行,所以这种方法会太慢。 –

0

SQL Fiddle。为了清楚起见,它显示了每个步骤的所有计算列。

with data_avg as (
    select start_ts, end_ts, "data" * 1.0/((end_ts + 1) - start_ts) data_avg 
    from test 
), gs as (
    select start_ts, start_ts + 99 end_ts 
    from generate_series(
     (select min(start_ts) from test), 
     (select max(end_ts) from test), 
     100 
    ) gs(start_ts) 
) 
select 
    t_start, t_end, 
    gs_start, gs_end, 
    cgroup, 
    s."start", s."end", 
    da.start_ts da_start, da.end_ts da_end 
    ,round((s."end" - s."start" + 1) * da.data_avg) "data" 
from (
    select 
     t.start_ts t_start, t.end_ts t_end, 
     gs.start_ts gs_start, gs.end_ts gs_end, 
     cgroup, 
     greatest(t.start_ts, gs.start_ts) "start", least(t.end_ts, gs.end_ts) "end" 
    from 
     test t 
     inner join 
     gs on 
      gs.start_ts between t.start_ts and t.end_ts 
      or 
      gs.end_ts between t.start_ts and t.end_ts 
    ) s 
    inner join 
    data_avg da on 
     da.start_ts between t_start and t_end 
     and 
     da.end_ts between t_start and t_end 
order by s."start" 

结果:

t_start | t_end | gs_start | gs_end | cgroup | start | end | da_start | da_end | data 
---------+-------+----------+--------+--------+-------+-----+----------+--------+------ 
     0 | 179 |  0 |  99 | G1  |  0 | 99 |  0 | 179 | 1111 
     0 | 179 |  100 | 199 | G1  | 100 | 179 |  0 | 179 | 889 
    180 | 499 |  100 | 199 | G2  | 180 | 199 |  180 | 499 | 63 
    180 | 499 |  200 | 299 | G2  | 200 | 299 |  180 | 499 | 313 
    180 | 499 |  300 | 399 | G2  | 300 | 399 |  180 | 499 | 313 
    180 | 499 |  400 | 499 | G2  | 400 | 499 |  180 | 499 | 313 
    500 | 699 |  500 | 599 | G1  | 500 | 599 |  500 | 699 | 500 
    500 | 699 |  600 | 699 | G1  | 600 | 699 |  500 | 699 | 500