2016-03-04 120 views
3

我目前正在处理BigQuery中的数据,然后导出到Excel中以执行最终的数据透视表,并希望能够使用BigQuery中的PIVOT选项创建相同的数据透视表。BigQuery枢轴数据行列

我在大查询组数据看起来像

Transaction_Month || ConsumerId || CUST_createdMonth 
01/01/2015  || 1   || 01/01/2015 
01/01/2015  || 1   || 01/01/2015 
01/02/2015  || 1   || 01/01/2015 
01/01/2015  || 2   || 01/01/2015 
01/02/2015  || 3   || 01/02/2015 
01/02/2015  || 4   || 01/02/2015 
01/02/2015  || 5   || 01/02/2015 
01/03/2015  || 5   || 01/02/2015 
01/03/2015  || 6   || 01/03/2015 
01/04/2015  || 6   || 01/03/2015 
01/06/2015  || 6   || 01/03/2015 
01/03/2015  || 7   || 01/03/2015 
01/04/2015  || 8   || 01/04/2015 
01/05/2015  || 8   || 01/04/2015 
01/04/2015  || 9   || 01/04/2015 

它本质上是与客户的附加信息的顺序表。

当我把这个数据到Excel我将其添加到透视表,我添加CUST_createdMonth作为行,Transaction_Month作为列,值是的ConsumerID

一个重复计数的输出如下 enter image description here

在BigQuery中可以使用这种支点吗?

回答

3

有BigQuery中这样做没有很好的方法,但你可以做到这一点遵循以下思路

步骤1

下面的查询运行

SELECT 'SELECT CUST_createdMonth, ' + 
    GROUP_CONCAT_UNQUOTED(
     'EXACT_COUNT_DISTINCT(IF(Transaction_Month = "' + Transaction_Month + '", ConsumerId, NULL)) as [m_' + REPLACE(Transaction_Month, '/', '_') + ']' 
    ) 
    + ' FROM yourTable GROUP BY CUST_createdMonth ORDER BY CUST_createdMonth' 
FROM (
    SELECT Transaction_Month 
    FROM yourTable 
    GROUP BY Transaction_Month 
    ORDER BY Transaction_Month 
) 

结果 - 你会得到像下面的字符串(为便于阅读,下面的格式)

SELECT 
    CUST_createdMonth, 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/01/2015", ConsumerId, NULL)) AS [m_01_01_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/02/2015", ConsumerId, NULL)) AS [m_01_02_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/03/2015", ConsumerId, NULL)) AS [m_01_03_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/04/2015", ConsumerId, NULL)) AS [m_01_04_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/05/2015", ConsumerId, NULL)) AS [m_01_05_2015], 
    EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/06/2015", ConsumerId, NULL)) AS [m_01_06_2015] 
    FROM yourTable 
GROUP BY 
    CUST_createdMonth 
ORDER BY 
    CUST_createdMonth 

步骤2

只需运行如上构成查询

结果将是LIKË下面

CUST_createdMonth m_01_01_2015 m_01_02_2015 m_01_03_2015 m_01_04_2015 m_01_05_2015 m_01_06_2015  
01/01/2015   2    1    0    0    0    0  
01/02/2015   0    3    1    0    0    0  
01/03/2015   0    0    2    1    0    1  
01/04/2015   0    0    0    2    1    0 

如果您有太多的时间来进行太多的手动工作,第一步会很有帮助。
在这种情况下 - 第1步帮助您生成查询

你可以看到更多关于我的其他职位旋转。

How to scale Pivoting in BigQuery?
请注意 - 有每个表的10K列的限制 - 所以你用10K组织的限制。
您还可以看到下面为简化实施例(如果上述一个太复杂/详细):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery

+0

1.我的答案中的代码在你的问题的例子后面定制,日期显然是字符串。 2.检查你的实际数据是否与你提供的例子相同。 3.如果仍然有问题排除故障并修复你的问题 - 显示产生错误的行 - 更好的3行(前一个和后一个) –

+0

嗨我删除了我的评论,我想我昨天看这个东西太久了,它当我今天尝试时完美地工作。感谢您的全力帮助 –

+0

很高兴您的工作顺利! –

1

实际上米哈伊尔还有另一种方式,以转置的EAV型模式的行转换成列通过使用日志表和查询最后一个CREATE TABLE条目来确定最新的表模式。

 CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String) 
       RETURNS ARRAY<STRING> AS ((
       SELECT 
        SPLIT(
        REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,') 
        ,',') 
      )); 
     WITH valid_schema_columns AS (
      WITH array_output aS (SELECT 
      jsonSchemaStringToArray(jsonSchema) AS column_names 
      FROM (
      SELECT 
       protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema 
       , ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count 
      FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101` 
      WHERE 
       protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>' 
       AND 
       protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>' 
       AND 
       protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED' 
     ) AS t 
      WHERE 
      t.record_count = 1 -- grab the latest entry 
     ) 
      -- this is actually what UNNESTS the array into standard rows 
      SELECT 
      valid_column_name 
      FROM array_output 
      LEFT JOIN UNNEST(column_names) AS valid_column_name 

     )