0
的BigQuery - 新手BigQuery的reddit的评论数据分析
试图获得对用户谁双双评论前10 subreddits和他们所使用的BigQuery reddit的数据
评论共同subreddits的计数我刚刚开始使用BQ,也是SQL的初学者,我发现很难获得此查询。有人可以给我一些指示,开始?
的BigQuery - 新手BigQuery的reddit的评论数据分析
试图获得对用户谁双双评论前10 subreddits和他们所使用的BigQuery reddit的数据
评论共同subreddits的计数我刚刚开始使用BQ,也是SQL的初学者,我发现很难获得此查询。有人可以给我一些指示,开始?
从来没有真正的需要在玩下面的reddit数据,只是为了抛出至少一些东西给你开始,因为似乎没有人愿意。
快速逻辑:
Step - 1: Identify top 10 most commented subreddits
SELECT subreddit
FROM [fh-bigquery:reddit_comments.subr_rank_201505]
ORDER BY comments
DESC LIMIT 10
步骤 - 2:对于每一个版(Subreddit)鉴定[固体]的用户(具有多于50条评论)
SELECT author, subreddit, COUNT(1) AS comments
FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (
SELECT subreddit
FROM [fh-bigquery:reddit_comments.subr_rank_201505]
ORDER BY comments DESC
LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit
HAVING comments > 50
步骤 - 3:对于每个subreddit标识一对普通用户(通过JOIN) 步骤 - 4:最后,每对用户的计算共同subreddits数
SELECT usera, userb, COUNT(1) AS subreddits
FROM (
SELECT
a.author AS usera,
b.author AS userb,
a.subreddit AS subreddit,
FROM (
SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit HAVING comments > 50) AS a
JOIN (
SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit HAVING comments > 50) AS b
ON a.subreddit = b.subreddit
WHERE a.author < b.author
)
GROUP BY usera, userb
HAVING subreddits > 3
ORDER BY subreddits DESC, usera, userb
希望这有助于
正如菲利普指出(隐含的) - 开始的最佳办法就是做什么你到目前为止 - 所以我们可以缩小我们的努力来帮助你。否则,它太宽泛,很难跳进 –
如果答案有助于解决您的问题,您应该考虑接受它 –