2017-05-07 78 views
-1

我在构建文件下载统计数据库和显示信息时遇到了一些困难。如何获取DISTINCT列和COUNT次SUB DISTINCT列的出现

表:customer_statistics

| user | product_id | file_download | date_accessed  | 
----------------------------------------------------------------- 
| tom | 1104  | file_1.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1010  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1077  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1749  | file_2.pdf  | 2017-05-06 00:00:00 | 
| sue | 1284  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1284  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1065  | file_1.pdf  | 2017-05-06 00:00:00 | 
| sue | 1344  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 2504  | file_2.pdf  | 2017-05-06 00:00:00 | 

我需要显示基于上面的表格如下:

汤姆下载file_3.pdf3不同的产品,但已下载file_3.pdfproduct_id 1048 4倍。

汤姆也从1产品下载file_1.pdf,并从product_id

汤姆只有一次从4不同的产品

苏已下载file_3.pdf2不同产品共7下载,但已经下载file_3.pdfproduct_id 1284 2次。

苏也从1产品下载file_1.pdf只有一次从product_id

苏也从1产品从product_id

苏下载file_2.pdf和只有一次从5不同共6下载产品

这样做的最佳方法是什么?

我需要重组我的表吗?

感谢先进!

+0

你想要的结果看起来像*那*? – Strawberry

+0

@Strawberry - 当然不是,我只是想要这些价值观 - 我这样拼写出来,所以很容易理解。 –

+2

那么,你能拼出来吗? – Strawberry

回答

1

请尝试以下...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts; 

该语句使用以下子查询来形成的userfile_downloadproduct_id独特组合列表开始...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id 
FROM customer_statistics 
GROUP BY user, 
     file_download, 
     product_id 

上述子查询的结果显示在下面的子查询中使用,以获得该user已经从网上下载file多少product_id值的个性化......

SELECT user AS user, 
     file_download AS file_download, 
     COUNT(product_id) AS CountOfProducts 
FROM (SELECT user AS user, 
       file_download AS file_download, 
       product_id AS product_id 
     FROM customer_statistics 
     GROUP BY user, 
       file_download, 
       product_id 
    ) AS uniqueComboFinder 
GROUP BY user, 
     file_download 

产生的数据集然后在product_id值的的userfile_download每个组合的计数被有效地追加到每对应的记录在customer_statistics这样的方式连接到一个实例customer_statistics

从该接合产生的数据集,然后通过的userfile_downloadproduct_id每个唯一组合以及属于各组(记录的计数分组即,每个时间的计数,一个user已经下载一个特定fileproduct_id )被计算。

我不记得是否MySQL要求CountOfProductsGROUP BY使用。但是,尽管user,file_downloadproduct_id的每个组合都决定了CountOfProducts的值,但许多形式的SQL都要求您选择每个非聚合字段的GROUP BY。因此,自从将CountOfProductsGROUP BY没有任何伤害,我已经包括了GROUP BY子句中CountOfProducts

如果一个或两个以上规则可以澄清关于它们的结构,则所显示的句子可以被自动生成。

如果您有任何问题或意见,请随时发布相应评论。

附录

要排除从结果集的单个用户,请使用以下的变化。

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user <> excludedUser 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts; 

我用excludedUser这里,但你可以替换成一个恒定值(如Sam)或保存作为目标的值的变量。

请注意,我已经加入了WHERE user <> excludedUser子句来最里面的子查询。由于其父级子查询的结果完全基于最内层子查询的结果,因此排除的用户不会在父子查询的重试中表示。并且,由于排除的User值未出现在父子查询的结果中,因此当主语句的INNER JOIN部分基于User的共享值执行时,目标User也将从连接数据集中排除。

通过添加WHERE子句到最里面的子查询,我避免不必要的处理少量由语句的中间和外水平,从而使得整体语句略微比如果user值被排除在更高效的中层或外层。

同样,如果需要排除多于一个User,您可以通过将它们的值User显式编码到语句中或通过连接到排除值表来排除它们。对于第一种情况使用...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user NOT IN ('Sam', 'I', 'Am') 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts; 

在第二种情况下使用...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user NOT IN (SELECT user 
            FROM excludedUsers 
           ) 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts; 
+0

绝对完美!我喜欢它。 –

+0

任何方式将其限制为'WHERE user = sam'? –

+0

我将修改我的答案以允许您排除'1'或多于'1'的用户。我只会几分钟。 – toonice

0

我给你一个提示让你去。

开始有利于聚集的开沟单独下载记录,像这样:

CREATE TEMPORARY TABLE IF NOT EXISTS basic_aggregated_stats 
SELECT user, file_id, product_id, COUNT(*) AS cnt 
    FROM customer_statistics 
    GROUP BY user, file_id, product_id; 

这只是一个步骤(其中,顺便说一下,也可以作为在更大更复杂的子查询查询)。您可以并且应该做更多的聚合来获得您需要的信息。这不是“重组表格”。

除了更多的聚合,你需要考虑获得正确的订单和生产小计以及。