2013-10-14 33 views
2

一个有效的(ISH)之后,我总结的BigQuery SQL查询来解决以下问题:谷歌的BigQuery SQL单独的列

我有一个表,看起来像这样:

 

    
    Row | Col_A | Col_B | 
    --------------------- 
    1 | 2 | 3 | 
    2 | 1 | 4 | 
    3 | 5 | 7 | 
    4 | 2 | 3 | 
    5 | 6 | 1 | 

    ...and so on (>million rows) 
 

的每列的值是范围为[1..7]的ID。

查询应为每个列如下,即每总和代码:无需使用多个SELECT查询

 

    
    Code | Total Col_A | Total Col_B 
    -------------------------------- 
     1 |  1  |  0 
     2 |  2  |  0 
     3 |  0  |  2 
     4 |  0  |  1 
     5 |  1  |  0 
     6 |  1  |  0 
     7 |  0  |  1 
 

任何人都知道的BigQuery中的这样的一种方式?

干杯。

+1

请告诉我们你有什么到目前为止已经试过。 – Szymon

回答

2

您可以使用您的样本数据创建公共数据集吗?编写对数据有效的查询并验证结果会更容易。

的起始查询:

SELECT Code, COUNT(Col_A) count_column_x, COUNT(Col_B) count_column_y 
FROM [your:list.of_codes] a 
LEFT JOIN EACH [your:sample.table] b 
ON a.Code=b.Col_A 
GROUP BY 1 

(它并不完美,如果你共用一张桌子一起工作会走得更远)

1

任何人都知道的BigQuery中这样做,而不使用的一种方式多个SELECT?

一个选择使用标准SQL

#standardSQL 
WITH logs AS (
    SELECT 2 AS Col_A, 3 AS Col_B UNION ALL 
    SELECT 1 AS Col_A, 4 AS Col_B UNION ALL 
    SELECT 5 AS Col_A, 7 AS Col_B UNION ALL 
    SELECT 2 AS Col_A, 3 AS Col_B UNION ALL 
    SELECT 6 AS Col_A, 1 AS Col_B 
) 
SELECT 
    id, 
    SUM(CAST(id = Col_A AS INT64)) AS Total_Col_A, 
    SUM(CAST(id = Col_B AS INT64)) AS Total_Col_B 
FROM logs, UNNEST(GENERATE_ARRAY(1,7)) AS id 
GROUP BY id 
ORDER BY id 

或用COUNTIF()

#standardSQL 
WITH logs AS (
    SELECT 2 AS Col_A, 3 AS Col_B UNION ALL 
    SELECT 1 AS Col_A, 4 AS Col_B UNION ALL 
    SELECT 5 AS Col_A, 7 AS Col_B UNION ALL 
    SELECT 2 AS Col_A, 3 AS Col_B UNION ALL 
    SELECT 6 AS Col_A, 1 AS Col_B 
) 
SELECT 
    id, 
    COUNTIF(id = Col_A) AS Total_Col_A, 
    COUNTIF(id = Col_B) AS Total_Col_B 
FROM logs, UNNEST(GENERATE_ARRAY(1,7)) AS id 
GROUP BY id 
ORDER BY id