2017-01-12 31 views
0

比方说,我有一个表activities,其中的字段为starttime (TIMESTAMP)stoptime (TIMESTAMP)。我想找到一个活动发生最多的时刻。查询应该首先返回这样的时刻。Google BigQuery - 基于时间间隔的最活跃时刻

我试图让所有starttime时间戳,然后为他们每个人算那些在那一刻发生的活动的数量。然后,找到最大:

#standardSQL 
SELECT 
    time, 
    (
    SELECT COUNT(*) 
    FROM activities 
    WHERE starttime <= time AND time <= stoptime 
) AS cnt 
FROM (
    SELECT DISTINCT starttime AS time 
    FROM activities 
    ORDER BY time 
) 
ORDER BY cnt DESC, time ASC 
LIMIT 1 

不幸的是,它说:LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

我认为对于这个外界数据库世界的一个适当的算法将是让所有starttimesstoptimes把它们放在一个阵列中的一种方式,他们将被区分,对它进行排序,然后依次由该阵列寻找最大时刻。但是,我不知道如何在SQL中表达这样的算法。我看过this但我不认为它有任何帮助。

+0

什么是你的时刻的粒度 - 是它第二,分钟还是小时或其他什么? –

+0

@MikhailBerlyant我认为这是毫秒。 –

+0

因此,您需要在整个时间段内找到您最精确的毫秒数?请确认,因为这听起来并不适用于大多数使用情况的实际,但你可能有一些特殊的情况下 –

回答

2

我已经实现了一些接近我在问题中描述的算法。它工作得很快,但如果你找到更好的东西,我会很高兴看到它。

#standardSQL 
SELECT time, SUM(add) OVER(ORDER BY time ASC, add DESC) AS cumsum 
FROM (
    SELECT starttime AS time, 1 AS add 
    FROM activities UNION ALL 
    SELECT stoptime AS time, -1 AS add 
    FROM activities 
) 
ORDER BY cumsum DESC 
1

考虑下面的版本
从我的角度来看,它返回更实际的输出 - 这是 - 同一级别的连续活动的各个阶段(对应开头和结尾)
所以你现在不只是开始但是具有最高活动的整个时期(开始和结束)。不只是一个,但他们都

#standardSQL 
WITH intervals AS (
    SELECT time AS start_, LEAD(time) OVER(ORDER BY time) AS end_ 
    FROM (
    SELECT DISTINCT time FROM (
     SELECT starttime AS time FROM activities UNION ALL 
     SELECT stoptime AS time FROM activities)) 
), 
equals AS (
    SELECT start_, end_, COUNT(1) AS cumsum 
    FROM intervals AS i 
    JOIN activities AS a 
    ON i.start_ >= a.starttime AND i.end_ <= a.stoptime 
    GROUP BY start_, end_ 
), 
grps AS (
    SELECT 
    start_, end_, cumsum, 
    IFNULL(
     CAST(end_ = LEAD(start_) OVER(ORDER BY start_) AND LEAD(cumsum) OVER(ORDER BY start_) = cumsum AS INT64), 
     CAST(NOT((start_ = LAG(end_) OVER(ORDER BY start_) AND LAG(cumsum) OVER(ORDER BY start_) = cumsum)) AS INT64) 
    ) AS flag 
    FROM equals 
) 
SELECT MIN(start_) AS start_, MAX(end_) AS end_, cumsum 
FROM (
    SELECT start_, end_, cumsum, SUM(flag) OVER(ORDER BY start_) AS grp 
    FROM grps 
) 
GROUP BY cumsum, grp 
ORDER BY start_ 

你可以用上面使用虚拟活动表玩

WITH activities AS (
    SELECT 1 AS starttime, 3 AS stoptime UNION ALL 
    SELECT 1 AS starttime, 4 AS stoptime UNION ALL 
    SELECT 4 AS starttime, 5 AS stoptime UNION ALL 
    SELECT 7 AS starttime, 8 AS stoptime UNION ALL 
    SELECT 7 AS starttime, 10 AS stoptime UNION ALL 
    SELECT 8 AS starttime, 12 AS stoptime 
) 

WITH activities AS (
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 5 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 8 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 10 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 8 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 MINUTE) AS stoptime 
)