如何获取DISTINCT列和COUNT次SUB DISTINCT列的出现

-1

我在构建文件下载统计数据库和显示信息时遇到了一些困难。如何获取DISTINCT列和COUNT次SUB DISTINCT列的出现

表：customer_statistics

| user | product_id | file_download | date_accessed  | 
----------------------------------------------------------------- 
| tom | 1104  | file_1.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1048  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1010  | file_3.pdf  | 2017-05-06 00:00:00 | 
| tom | 1077  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1749  | file_2.pdf  | 2017-05-06 00:00:00 | 
| sue | 1284  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1284  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 1065  | file_1.pdf  | 2017-05-06 00:00:00 | 
| sue | 1344  | file_3.pdf  | 2017-05-06 00:00:00 | 
| sue | 2504  | file_2.pdf  | 2017-05-06 00:00:00 |

我需要显示基于上面的表格如下：

汤姆下载file_3.pdf从3不同的产品，但已下载file_3.pdf从product_id 1048 4倍。

汤姆也从1产品下载file_1.pdf，并从product_id

汤姆只有一次从4不同的产品

苏已下载file_3.pdf从2不同产品共7下载，但已经下载file_3.pdf从product_id 1284 2次。

苏也从1产品下载file_1.pdf只有一次从product_id

苏也从1产品从product_id

苏下载file_2.pdf和只有一次从5不同共6下载产品

这样做的最佳方法是什么？

我需要重组我的表吗？

感谢先进！

来源

2017-05-07 Papa Wheelz

你想要的结果看起来像*那*？ – Strawberry

@Strawberry - 当然不是，我只是想要这些价值观 - 我这样拼写出来，所以很容易理解。 –

那么，你能拼出来吗？ – Strawberry

请尝试以下...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts;

该语句使用以下子查询来形成的user，file_download和product_id独特组合列表开始...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id 
FROM customer_statistics 
GROUP BY user, 
     file_download, 
     product_id

上述子查询的结果显示在下面的子查询中使用，以获得该user已经从网上下载file多少product_id值的个性化......

SELECT user AS user, 
     file_download AS file_download, 
     COUNT(product_id) AS CountOfProducts 
FROM (SELECT user AS user, 
       file_download AS file_download, 
       product_id AS product_id 
     FROM customer_statistics 
     GROUP BY user, 
       file_download, 
       product_id 
    ) AS uniqueComboFinder 
GROUP BY user, 
     file_download

产生的数据集然后在product_id值的的user和file_download每个组合的计数被有效地追加到每对应的记录在customer_statistics这样的方式连接到一个实例customer_statistics。

从该接合产生的数据集，然后通过的user，file_download和product_id每个唯一组合以及属于各组（记录的计数分组即，每个时间的计数，一个user已经下载一个特定file从product_id ）被计算。

我不记得是否MySQL要求CountOfProducts被GROUP BY使用。但是，尽管user,file_download和product_id的每个组合都决定了CountOfProducts的值，但许多形式的SQL都要求您选择每个非聚合字段的GROUP BY。因此，自从将CountOfProducts到GROUP BY没有任何伤害，我已经包括了GROUP BY子句中CountOfProducts。

如果一个或两个以上规则可以澄清关于它们的结构，则所显示的句子可以被自动生成。

如果您有任何问题或意见，请随时发布相应评论。

附录

要排除从结果集的单个用户，请使用以下的变化。

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user <> excludedUser 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts;

我用excludedUser这里，但你可以替换成一个恒定值（如Sam）或保存作为目标的值的变量。

请注意，我已经加入了WHERE user <> excludedUser子句来最里面的子查询。由于其父级子查询的结果完全基于最内层子查询的结果，因此排除的用户不会在父子查询的重试中表示。并且，由于排除的User值未出现在父子查询的结果中，因此当主语句的INNER JOIN部分基于User的共享值执行时，目标User也将从连接数据集中排除。

通过添加WHERE子句到最里面的子查询，我避免不必要的处理少量由语句的中间和外水平，从而使得整体语句略微比如果user值被排除在更高效的中层或外层。

同样，如果需要排除多于一个User，您可以通过将它们的值User显式编码到语句中或通过连接到排除值表来排除它们。对于第一种情况使用...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user NOT IN ('Sam', 'I', 'Am') 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts;

在第二种情况下使用...

SELECT user AS user, 
     file_download AS file_download, 
     product_id AS product_id, 
     COUNT(*) AS CountPerProduct, 
     CountOfProducts AS CountOfProducts 
FROM customer_statistics 
JOIN (SELECT user AS user, 
       file_download AS file_download, 
       COUNT(product_id) AS CountOfProducts 
     FROM (SELECT user AS user, 
        file_download AS file_download, 
        product_id AS product_id 
       FROM customer_statistics 
       WHERE user NOT IN (SELECT user 
            FROM excludedUsers 
           ) 
       GROUP BY user, 
         file_download, 
         product_id 
      ) AS uniqueComboFinder 
     GROUP BY user, 
       file_download 
    ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user 
           AND customer_statistics.file_download = CountOfProductsFinder.file_download 
GROUP BY user, 
     file_download, 
     product_id, 
     CountOfProducts;

来源

2017-05-08 01:43:51 toonice

绝对完美！我喜欢它。 –

任何方式将其限制为'WHERE user = sam'？ –

我将修改我的答案以允许您排除'1'或多于'1'的用户。我只会几分钟。 – toonice

我给你一个提示让你去。

开始有利于聚集的开沟单独下载记录，像这样：

CREATE TEMPORARY TABLE IF NOT EXISTS basic_aggregated_stats 
SELECT user, file_id, product_id, COUNT(*) AS cnt 
    FROM customer_statistics 
    GROUP BY user, file_id, product_id;

这只是一个步骤（其中，顺便说一下，也可以作为在更大更复杂的子查询查询）。您可以并且应该做更多的聚合来获得您需要的信息。这不是“重组表格”。

除了更多的聚合，你需要考虑获得正确的订单和生产小计以及。

来源

2017-05-07 23:54:30 einpoklum

如何获取DISTINCT列和COUNT次SUB DISTINCT列的出现

回答

相关问题