2016-04-19 118 views
0

的BigQuery - 新手BigQuery的reddit的评论数据分析

试图获得对用户谁双双评论前10 subreddits和他们所使用的BigQuery reddit的数据

评论共同subreddits的计数

我刚刚开始使用BQ,也是SQL的初学者,我发现很难获得此查询。有人可以给我一些指示,开始?

+1

正如菲利普指出(隐含的) - 开始的最佳办法就是做什么你到目前为止 - 所以我们可以缩小我们的努力来帮助你。否则,它太宽泛,很难跳进 –

+0

如果答案有助于解决您的问题,您应该考虑接受它 –

回答

2

从来没有真正的需要在玩下面的reddit数据,只是为了抛出至少一些东西给你开始,因为似乎没有人愿意。

快速逻辑:

Step - 1: Identify top 10 most commented subreddits 

SELECT subreddit 
FROM [fh-bigquery:reddit_comments.subr_rank_201505] 
ORDER BY comments 
DESC LIMIT 10 

步骤 - 2:对于每一个版(Subreddit)鉴定[固体]的用户(具有多于50条评论)


SELECT author, subreddit, COUNT(1) AS comments 
FROM [fh-bigquery:reddit_comments.2016_01] 
WHERE subreddit IN (
    SELECT subreddit 
    FROM [fh-bigquery:reddit_comments.subr_rank_201505] 
    ORDER BY comments DESC 
    LIMIT 10) 
AND author NOT IN ('AutoModerator', '[deleted]') 
GROUP BY author, subreddit 
HAVING comments > 50 

步骤 - 3:对于每个subreddit标识一对普通用户(通过JOIN) 步骤 - 4:最后,每对用户的计算共同subreddits数


SELECT usera, userb, COUNT(1) AS subreddits 
FROM (
    SELECT 
    a.author AS usera, 
    b.author AS userb, 
    a.subreddit AS subreddit, 
    FROM (
    SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01] 
    WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10) 
    AND author NOT IN ('AutoModerator', '[deleted]') 
    GROUP BY author, subreddit HAVING comments > 50) AS a 
    JOIN (
    SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01] 
    WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10) 
    AND author NOT IN ('AutoModerator', '[deleted]') 
    GROUP BY author, subreddit HAVING comments > 50) AS b 
    ON a.subreddit = b.subreddit 
    WHERE a.author < b.author 
) 
GROUP BY usera, userb 
HAVING subreddits > 3 
ORDER BY subreddits DESC, usera, userb 

希望这有助于