2013-06-28 45 views
2

BigQuery中,我们试图运行:如何使此查询有效运行?

SELECT day, AVG(value)/(1024*1024) FROM ( 
    SELECT value, UTC_USEC_TO_DAY(timestamp) as day, 
     PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
    FROM [Datastore.PerformanceDatum] 
    WHERE type = "MemoryPerf" 
) WHERE rank >= 0.9 AND rank <= 0.91 
GROUP BY day 
ORDER BY day desc; 

返回的数据相对较少。但我们得到的消息:

Error: Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby 

什么使这个查询失败,子查询的大小?是否有一些等价的查询可以避免这个问题?


编辑回应评论:如果我添加组分别由(落外ORDER BY),查询失败,声称GROUP分别由这里不是并行。

+0

您是否尝试过使用“GROUP EACH BY”作为错误消息提示? – hexafraction

+0

如果我添加GROUP EACH BY(并删除外部ORDER BY),则查询失败,声称GROUP EACH BY在这里不可并行化。有什么我失踪? –

+1

添加到您的文章。我只是试图帮助使其负责,不太可能被搁置“ – hexafraction

回答

1

我写了对我的作品的等效查询:

SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, UTC_USEC_TO_DAY(dtimestamp) as day, 
     PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
    FROM [io_sensor_data.moscone_io13] 
    WHERE sensortype = "humidity" 
) WHERE rank >= 0.9 AND rank <= 0.91 
GROUP BY day 
ORDER BY day desc; 

如果我只运行内部查询,我得到3660624个结果。你的数据集比那个更大吗?

当按天分组时,外部选择仅给出4个结果。我会尝试一个不同的分组,看看我是否可以达到极限:

SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, dtimestamp/1000 as day, 
     PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
    FROM [io_sensor_data.moscone_io13] 
    WHERE sensortype = "humidity" 
) WHERE rank >= 0.9 AND rank <= 0.91 
GROUP BY day 
ORDER BY day desc; 

也运行,现在有57,862个不同的组。

我尝试了不同的组合来达到相同的错误。当您将初始数据量加倍时,我可以得到同样的错误。一个简单的“黑客”到的数据量翻番正在发生变化:

FROM [io_sensor_data.moscone_io13] 

要:

FROM [io_sensor_data.moscone_io13], [io_sensor_data.moscone_io13] 

然后我得到同样的错误。你有多少数据?你能申请一个额外的过滤器吗?由于您已经按天划分了percentile_rank,您是否可以添加额外的查询来仅分析一小部分日期(例如,仅上个月)?

+0

仅仅分析一小部分日子就是我现在正在做的黑客攻击,但是由于实际返回的数据非常少,所以它有点让我毛骨悚然。 –