下面是BigQuery的标准SQL
#standardSQL
WITH outages AS (
SELECT
id,
MIN(dayOfWeek) AS dayOfWeek,
MIN(hourOfDay) AS hourOfDay,
COUNT(1) AS len
FROM (
SELECT
id, seq,
FIRST_VALUE(dayOfWeek) OVER(win) AS dayOfWeek,
FIRST_VALUE(hourOfDay) OVER(win) AS hourOfDay
FROM (
SELECT
id, dayOfWeek, hourOfDay, dollars,
COUNTIF(dollars <> 0) OVER(PARTITION BY id ORDER BY dayOfWeek, hourOfDay) AS seq
FROM `yourTable`
)
WHERE dollars = 0
WINDOW win AS (PARTITION BY id, seq ORDER BY dayOfWeek, hourOfDay)
)
GROUP BY id, seq
),
averages AS (
SELECT id, AVG(len) AS len
FROM outages
GROUP BY id
)
SELECT o.*
FROM outages AS o JOIN averages AS a
ON o.id = a.id AND o.len > a.len
您可以测试/使用其虚拟数据从你的问题如下
#standardSQL
WITH yourTable AS (
SELECT * FROM UNNEST([STRUCT<id INT64, dayOfWeek INT64, hourOfDay INT64, dollars INT64>(1, 1, 1, 0),(1, 1, 2, 0),(1, 1, 3, 0),(1, 1, 4, 0),(1, 1, 5, 6),(1, 1, 6, 5),(1, 1, 7, 7),(1, 1, 8, 18),(1, 1, 9, 13),(1, 1, 10, 19),(1, 1, 11, 18),(1, 1, 12, 13),(1, 1, 13, 19),(1, 1, 14, 10),(1, 1, 15, 16),(1, 1, 16, 15),(1, 1, 17, 17),(1, 1, 18, 18),(1, 1, 19, 13),(1, 1, 20, 0),(1, 1, 21, 0),(1, 1, 22, 0),(1, 1, 23, 0),(1, 2, 0, 0),(1, 2, 1, 0),(1, 2, 2, 0),(1, 2, 3, 0),(1, 2, 4, 0),(1, 2, 5, 16),(1, 2, 6, 15),(1, 2, 7, 27),(1, 2, 8, 11),(1, 2, 9, 13),(1, 2, 10, 11),(1, 2, 11, 18),(1, 2, 12, 14),(1, 2, 13, 14),(1, 2, 14, 10),(1, 2, 15, 16),(1, 2, 16, 15),(1, 2, 17, 17),(1, 2, 18, 18),(1, 2, 19, 13),(1, 2, 20, 10),(1, 2, 21, 22),(1, 2, 22, 0),(1, 2, 23, 0)])
),
outages AS (
SELECT
id,
MIN(dayOfWeek) AS dayOfWeek,
MIN(hourOfDay) AS hourOfDay,
COUNT(1) AS len
FROM (
SELECT
id, seq,
FIRST_VALUE(dayOfWeek) OVER(win) AS dayOfWeek,
FIRST_VALUE(hourOfDay) OVER(win) AS hourOfDay
FROM (
SELECT
id, dayOfWeek, hourOfDay, dollars,
COUNTIF(dollars <> 0) OVER(PARTITION BY id ORDER BY dayOfWeek, hourOfDay) AS seq
FROM `yourTable`
)
WHERE dollars = 0
WINDOW win AS (PARTITION BY id, seq ORDER BY dayOfWeek, hourOfDay)
)
GROUP BY id, seq
),
averages AS (
SELECT id, AVG(len) AS len
FROM outages
GROUP BY id
)
SELECT o.*
FROM outages AS o JOIN averages AS a
ON o.id = a.id AND o.len > a.len
正如你可以看到这里玩 - outages
子选择计算所有具有序列长度和该序列开始的零序列,并输出如下
id dayOfWeek hourOfDay len
1 1 1 4
1 1 20 9
1 2 22 2
个
最终选择输出从中断只有行,其中各长度大于平均长度(从averages
子选择),选择那些ID
id dayOfWeek hourOfDay len
1 1 20 9
什么是'平均连续的0朝向day'的结束?顺便说一句,如果您可以编辑您的问题以显示您遇到问题的代码的[最小,完整和可验证示例](http://stackoverflow.com/help/mcve),那将是非常好的,那么我们可以尝试帮助解决具体问题。你也可以阅读[如何问](http://stackoverflow.com/help/how-to-ask)。 –
例如,通常有1-2个连续的0(例如小时22,23 = 0),但是我想要捕获如上所述的实例(dayOfWeek = 1),其中有4个连续的0(小时20,21,22,23 )。我有道理吗?正式 - –
- 现在有道理。希望这从商业的角度来看也是有道理的:o) –