2017-08-09 36 views
2

在我的CENSUS表中,我想按州分组,并且为每个州获得县中位数和县数。百分点函数与BigQuery中的GROUPBY

在psql里,红移和雪花,我可以这样做:

psql=> SELECT state, count(county), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "population2000") AS median FROM CENSUS GROUP BY state; 
     state   | count | median 
----------------------+-------+---------- 
Alabama    | 67 | 36583 
Alaska    | 24 | 7296.5 
Arizona    | 15 | 116320 
Arkansas    | 75 | 20229 
... 

我试图找到一个很好的方式在标准的BigQuery做到这一点。我注意到有没有无证的percentile_cont分析功能可用,但我必须做一些主要的黑客来让它做我想做的事情。

我希望能够做同样的事情与我所收集的是正确的参数:

SELECT 
    state, 
    COUNT(county), 
    PERCENTILE_CONT(population2000, 
    0.5) OVER() AS `medPop` 
FROM 
    CENSUS 
GROUP BY 
    state; 

但这种查询产生的错误

SELECT list expression references column population2000 which is neither grouped nor aggregated at 

可以得到我想要的答案,但是如果这是推荐的方式来做我想做的事,我会非常失望:

SELECT 
    MAX(nCounties) AS nCounties, 
    state, 
    MAX(medPop) AS medPop 
FROM (
    SELECT 
    nCounties, 
    T1.state, 
    (PERCENTILE_CONT(population2000, 
     0.5) OVER (PARTITION BY T1.state)) AS `medPop` 
    FROM 
    census T1 
    LEFT OUTER JOIN (
    SELECT 
     COUNT(county) AS `nCounties`, 
     state 
    FROM 
     census 
    GROUP BY 
     state) T2 
    ON 
    T1.state = T2.state) T3 
GROUP BY 
    state 

有没有更好的方法去做我想做的事情?此外,PERCENTILE_CONT函数是否有记录?

感谢您的阅读!

回答

5

感谢您的关注。 PERCENTILE_CONT正在开发中,我们将在发布GA之后发布文档。我们将首先作为分析函数来支持它,并且我们计划稍后将它作为聚合函数(允许GROUP BY)来支持它。这两个版本之间,一个简单的解决方法是

SELECT 
    state, 
    ANY_VALUE(nCounties) AS nCounties, 
    ANY_VALUE(medPop) AS medPop 
FROM (
    SELECT 
    state, 
    COUNT(county) OVER (PARTITION BY state) AS nCounties, 
    PERCENTILE_CONT(population2000, 
     0.5) OVER (PARTITION BY state) AS medPop 
    FROM 
    CENSUS) 
GROUP BY 
    state 
+1

更新:我们已经在https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#公布的文件PERCENTILE_CONT。 –