查找产品之间的共同账户的重复计数

想象一个表有两列如下：查找产品之间的共同账户的重复计数

Account_ID (integer) 
Product_ID (integer)

其他列并不重大。这列出了通过帐户购买的产品。我想创建一个三列的输出，像这样：

Account_ID_1 | Account_ID_2 | Count(distinct product_ID)

结果应该让每个ACCOUNT_ID组合中Account_IDs的所有值的组合，共同Product_Ids相关的重复计数。我正在使用Google BigQuery。有没有一种SQL方法来做到这一点，或者我应该计划使用完整的编程语言进行编码吗？

来源

2015-11-05 BAA

我做不知道Google bigquery，但看看[CROSS JOIN]（http://www.w3resource.com/sql/joins/cross-join.php） – Rik

因此，如果两个帐户具有相同的'product_id' count（）是1，并且只有一个拥有它0？或者与account1 +不同于account2？ –

你想要计数0吗？ –

这个工作对我来说：

select 
    t1.Account_ID, T2.Account_ID, count(t1.Product_ID) count_product_id 
from 
    MYTABLE t1 join MYTABLE t2 on t1.Product_ID = t2.Product_ID 
where t1.Account_ID <> t2.Account_ID 
group by t1.Account_ID, t2.Account_ID 
order by 1,2

来源

2015-11-05 21:29:14 gadaju

应该是：其中t1.Account_ID

同意。 “<>”会给你一行帐户A和B，另一行帐户B和A.在这种情况下没有意义，因为计数是相同的。 “<”将确保您只有一次获得A/B组合。 –

这里我计算两个帐户在comon中有多少产品。

SELECT 
    T1.Account_ID as Account_ID_1, 
    T2.Account_ID as Account_ID_2, 
    COUNT(distinct T1.product_id) 

From YourTable as T1 
JOIN YourTable as T2 
    ON T1.Account_ID < T2.Account_ID 
AND T1.product_ID = T2.product_ID 
GROUP BY 
    T1.Account_ID, 
    T2.Account_ID

来源

2015-11-05 21:27:34

的BigQuery的版本：

（的连接只在平等，同时保持<在WHERE子句）

SELECT a.corpus, b.corpus, EXACT_COUNT_DISTINCT(a.word) c 
FROM 
(SELECT corpus, word FROM [publicdata:samples.shakespeare]) a 
JOIN 
(SELECT corpus, word FROM [publicdata:samples.shakespeare]) b 
ON a.word=b.word 
WHERE a.corpus>b.corpus 
GROUP BY 1, 2 
ORDER BY 4 DESC

来源

2015-11-05 21:42:47

这正是我需要的查询！非常感谢你。查询处理samples.shakespeare，但我的表似乎太大了。错误说：Shuffle达到表__I0的广播限制（广播至少151001878字节）。考虑使用分区连接而不是广播连接。有什么想法？ – BAA

做一个JOIN EACH（如果这是适合你的答案，你为什么接受另一个？） –

查找产品之间的共同账户的重复计数

回答

相关问题