2015-10-14 254 views
0

我正在研究PostgreSQL中的一个巨大的数据库。 (对不起,如果这是不正确的编辑,我已经试了几个小时,仍然在努力)按值组分组

这是用于我的查询:(表user_activities)与一些示例数据的结构的一部分。

+---------------------+---------------------+---------------------+ 
| user_id    | activity   | operation   | 
+---------------------+---------------------+---------------------+ 
| 1     | 1     | 1     | 
| 1     | 1     | 1     | 
| 1     | 1     | 1     | 
| 2     | 1     | 2     | 
| 2     | 1     | 3     | 
| 3     | 1     | 3     | 
| 4     | 1     | 4     | 
| 4     | 1     | 4     | 
| 5     | 1     | 4     | 
| 5     | 1     | 5     | 
| 6     | 3     | 1     | 
| 6     | 3     | 1     | 
| 6     | 3     | 2     | 
| 7     | 3     | 3     | 
| 8     | 3     | 4     | 
| 8     | 3     | 5     | 
+---------------------+---------------------+---------------------+ 

,这是我想要的输出:

+---------------------+---------------------+---------------------+ 
| count(user_id)  | activity   | operation   | 
+---------------------+---------------------+---------------------+ 
| 4     | 1     | 1,2     | 
| 6     | 1     | 3,4,5    | 
| 6     | 3     | 1,2,3,4,5   | 
+---------------------+---------------------+---------------------+ 

我需要统计USER_ID运营值的每个活动和组。所以我需要在活动为1或3时按活动进行分组(已完成WHERE activity IN (1,3))。但我也需要按操作分组。问题是每一组操作都会有超过1个值。操作可以是1,2,3,4和5.我想连接1,2的组和3,4,5的组。但是,这并不是全部...

如果我按手术分组,那么每个活动都会有5组。我需要为活动1(已指定组)设置2个组,并且只有一个组具有所有操作值(如果活动为3)。

这可能吗?

编辑: 我现在无法检查答案,我希望明天能够。因此,我会为我的投票和答复提供答案,谢谢你的帮助。

+1

我认为你应该编辑你的问题,并提供样本数据和期望的结果(说明你在找什么)以及你现在有的查询(帮助其他人编写查询)。 –

+0

@GordonLinoff好吧,给我一分钟,编辑 – AleOtero93

+0

你可以使用tablefunc扩展吗? – dtelaroli

回答

1

SQL Fiddle Demo

只需用一个例子给你想要的组放在一起。

WITH cte as (
    SELECT "user_id", "activity", "operation", 
     CASE 
      WHEN "activity" = 1 THEN 
        CASE 
         WHEN "operation" IN (1,2) THEN '1_first'   
         ELSE '1_second' 
        END 
      WHEN "activity" = 3 THEN '3_first' 
     END as "op_group" 
    FROM user_activities 
) 
SELECT "activity", 
     "op_group", 
     count("user_id"), 
     array_agg(distinct "operation") as "operation" 
FROM cte 
GROUP BY "activity", "op_group" 

输出

| activity | op_group | count | operation | 
|----------|----------|-------|-----------| 
|  1 | 1_first |  4 |  1,2 | 
|  1 | 1_second |  6 |  3,4,5 | 
|  3 | 3_first |  6 | 1,2,3,4,5 | 
+1

我修复了我的回答 –

+0

我有一个问题......是否可以使用WHEN操作IN 1,2)'而不是'当“操作”= 1或“操作”= 2'? – AleOtero93

+0

是的,我更新我的答案和小提琴那个变化 –

2

更新了您的详细规格:

SELECT COUNT(*) as cnt, ua.activity, array_agg(distinct ua.operation) 
FROM users ua 
JOIN (
    SELECT 1 AS activity, 1 as operation, 1 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 2 as operation, 1 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 3 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 4 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 5 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 1 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 2 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 3 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 4 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 5 as operation, 3 as GROUP_CODE 
) c 
ON ua.activity = c.activity and ua.operation = c.operation 
GROUP BY c.GROUP_CODE, ua.activity 

http://sqlfiddle.com/#!15/46e1f/15


原始回答

这是我该怎么做的,下面我动态创建逻辑表,但你也可以在你的数据库中有表并加入它。

SELECT GROUP_CODE, COUNT(*) as cnt 
FROM user_activities ua 
JOIN (
    SELECT 1 AS activity, 1 as operation, 1 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 2 as operation, 1 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 3 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 4 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 1 AS activity, 5 as operation, 2 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 1 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 2 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 3 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 4 as operation, 3 as GROUP_CODE 
    UNION ALL 
    SELECT 3 AS activity, 5 as operation, 3 as GROUP_CODE 
) c 
ON ua.activity = c.activity and ua.operation = c.operation 
GROUP BY GROUP_CODE 

这应该是相当快 - 还记得SQL设计有集(表)的工作,并加入 - 这个使用加入到执行的逻辑。这也很好,因为如果你把它作为一个表,你可以通过改变表来改变逻辑,或者如果你添加另一列来选择,然后在查询运行时选择使用哪一个,就可以在表中存储多个“逻辑” 。

我已经使用类似的方法在动态用户界面中进行加权和个性化排序。

2

从我的理解,像这样的查询会帮助你。在的问题和意见信息搞糊涂了一点点,所以我用我最好的判断提供了解决方案

create table test (user_id int, activity int, operation int); 
insert into test values (1,1,1), (1,1,1), (1,1,2), (2,1,3), (2,1,4), (3,3,1), (4,3,3), (4,3,5); 

select count(*), activity, array_agg(operation) 
from test 
group by activity, user_id 

Result: 
| count | activity | array_agg | 
| 3  | 1  | {1,1,2} | 
| 2  | 1  | {3,4}  | 
| 1  | 3  | {1}  | 
| 2  | 3  | {3,5}  | 

基于编辑的问题,我觉得这是我想解决这个问题:

表:

create table test (user_id int, activity int, operation int); 
insert into test values 
(1,1,1),(1,1,1),(1,1,1), 
(2,1,2),(2,1,3), 
(3,1,3), 
(4,1,4),(4,1,4), 
(5,1,4),(5,1,5), 
(6,3,1),(6,3,1),(6,3,2), 
(7,3,3), 
(8,3,4),(8,3,5); 

查询:

select count(*), activity, string_agg(distinct operation::VARCHAR, ',') 
from test 
where operation in (1,2) and activity = 1 
group by activity 

UNION ALL 

select count(*), activity, string_agg(distinct operation::VARCHAR, ',') 
from test 
where operation in (3,4,5) and activity = 1 
group by activity 

UNION ALL 

select count(*), activity, string_agg(distinct operation::VARCHAR, ',') 
from test 
where activity = 3 
group by activity 

结果

count | activity | string_agg 
4  | 1  | 1,2 
6  | 1  | 3,4,5 
6  | 3  | 1,2,3,4,5 
+0

这是错误的。如果按活动和用户标识进行分组,则每分钟用户将得到一个唯一的行。有8个用户ID,是的你很困惑。 – Hogan

+0

当我上次看到它时,@Hogan没有8个用户标识。我将在今天晚些时候尝试改进答案。 – zedfoxus

+0

对不起,我的坏消息......我实际上需要一些睡眠。所以我的坏 – AleOtero93