2013-05-02 22 views
1

我与BigQuery玩,碰到一个问题,我的查询语句:响应太大而无法返回限制1;

SELECT * FROM (
SELECT a.title, a.counter , MAX(b.num_characters) as max 
FROM (
    SELECT title, count(*) as counter FROM publicdata:samples.wikipedia 
    GROUP EACH BY title 
    ORDER BY counter DESC 
    LIMIT 10 
) a JOIN 
(SELECT title,num_characters FROM publicdata:samples.wikipedia 
) b ON a.title = b.title 
GROUP BY a.title, a.counter) 
LIMIT 1; 

虽然这是有效的,我得到的反应过大,无法返回。第一个子查询运行良好,我想要做的是获得更多的列。但我失败了。

回答

2

不要担心“限制1”,在到达该阶段之前,响应会变得太大。

尝试跳过第二个子查询,因为它仅从大数据集中选择2列,而没有对其进行过滤。一个可行的替代方案是:

SELECT 
    a.title, a.counter, MAX(b.num_characters) AS max 
FROM 
    publicdata:samples.wikipedia b JOIN(
    SELECT 
    title, COUNT(*) AS counter 
    FROM 
    publicdata:samples.wikipedia 
    GROUP EACH BY title 
    ORDER BY 
    counter DESC 
    LIMIT 10) a 
    ON a.title = b.title 
GROUP BY 
    a.title, 
    a.counter 

这运行15.4秒。

我们可以做得更快,使用TOP():

SELECT 
    a.title title, counter, MAX(num_characters) max 
FROM 
    publicdata:samples.wikipedia b 
JOIN 
    (
    SELECT 
    TOP(title, 10) AS title, COUNT(*) AS counter 
    FROM 
    publicdata:samples.wikipedia 
    ) a 
    ON a.title=b.title 
GROUP BY 
    title, counter 

TOP()作为一个简单和快速(SELECT COUNT(*)/组/ LIMIT)。

https://developers.google.com/bigquery/docs/query-reference#top-function

现在它运行在仅6.5s,处理15.9 GB。