2017-10-19 31 views
0

我有两张医疗数据表,我希望找到每张表格中常见的医院,并在各自的表格中使用与每家医院相关的两个字段。但请记住,它们有两个不同的表格。如何使用Google BigQuery SQL交叉引用两个大数据表?

这是我到目前为止,但总是回来没有结果。

SELECT 
    Spending/Rating AS Ratio, 
    Provider_ID_Info AS ID 
FROM (
    SELECT 
    Period, 
    Provider_ID_Spend, 
    Avg_Spending_Per_Episode_Hospital as Spending, 
    Provider_ID_Info, 
    Hospital_overall_rating as Rating 
    FROM 
    [OmniHealth.HospitalSpending], 
    [OmniHealth.HospitalGeneralInfo] 
    WHERE 
    Provider_ID_Info = Provider_ID_Spend 
    GROUP BY 
    Rating, 
    Spending, 
    Provider_ID_Spend, 
    Avg_Spending_Per_Episode_Hospital, 
    Provider_ID_Info, 
    Hospital_overall_rating, 
    Period 
    ) 

    GROUP BY 
    Ratio, 
    ID, 
    Spending, 
    Rating 

    ORDER BY 
    Ratio 

回答

2

您的查询的问题实际上是在BigQuery中遗留SQL - 逗号被用作UNION ALL--而不是JOIN !!!

那么,试试下面的BigQuery的传统SQL(因为它是在你的问题查询的SQL方言)

#legacySQL 
SELECT 
    Spending/Rating AS Ratio, 
    Provider_ID_Info AS ID 
FROM (
    SELECT 
    Period, 
    Provider_ID_Spend, 
    Avg_Spending_Per_Episode_Hospital AS Spending, 
    Provider_ID_Info, 
    Hospital_overall_rating AS Rating 
    FROM 
    [OmniHealth.HospitalSpending] s 
    JOIN 
    [OmniHealth.HospitalGeneralInfo] i 
    ON 
    i.Provider_ID_Info = s.Provider_ID_Spend 
    GROUP BY 
    Rating, 
    Spending, 
    Provider_ID_Spend, 
    Provider_ID_Info, 
    Period 
) 
GROUP BY 
    Ratio, 
    ID, 
    Spending, 
    Rating 
ORDER BY 
    Ratio 

建议与BigQuery标准SQL

#standardSQL 
SELECT 
    Spending/Rating AS Ratio, 
    Provider_ID_Info AS ID 
FROM (
    SELECT 
    Period, 
    Provider_ID_Spend, 
    Avg_Spending_Per_Episode_Hospital AS Spending, 
    Provider_ID_Info, 
    Hospital_overall_rating AS Rating 
    FROM 
    `OmniHealth.HospitalSpending` s 
    JOIN 
    `OmniHealth.HospitalGeneralInfo` i 
    ON 
    i.Provider_ID_Info = s.Provider_ID_Spend 
    GROUP BY 
    Rating, 
    Spending, 
    Provider_ID_Spend, 
    Provider_ID_Info, 
    Period 
) 
GROUP BY 
    Ratio, 
    ID, 
    Spending, 
    Rating 
ORDER BY 
    Ratio  

工作请注意:如果您有任何JOIN结果,您可能会遇到模棱两可的字段问题 - 因此您需要使用各自的别名 - si

更新:
也删除了GROUP额外列BY

然而,真实的数据

Ratio ID 
    0.0 10019 
    9.0 10019 
    39.5 10019 
    86.0 10019 
236.5 10019 
458.5 10019 
485.0 10019 
531.0 10019 
1259.0 10019 
1772.0 10019 
8834.0 10019 

另一个更新来看,这完全是另一回事,如果这是有道理,还是没有 - 但它是您查询/逻辑 - 所以由你决定

+0

非常感谢!出于某种原因,我一直在我的“比例”字段中为空。但是,如果我用他们所指的特定字段替换“支出”和“评分”,我会得到这样的错误:“无法识别的名称:Avg_Spending_Per_Episode_Hospital [2:3]” –

+0

“,您不应该替换字段名称,或者如果您想要 - 在这种情况下,您需要删除各自的别名并使用原始字段。所以至少现在我们用JOIN的错误语法解决了你的问题。为了进一步解决NULL - 你应该为这两个表格提供简化的数据示例,以便我们能够解决这个问题。合理? –

+0

另请参阅[如何提问](http://stackoverflow.com/help/how-to-ask)和[最小,完整和可验证示例](http://stackoverflow.com/help/mcve) –