2016-09-28 28 views
1

我有一个用户流量表,我需要获取新用户相比前一天的收益/损失。只是想知道是否有更好的方法来做到这一点,而不是下面的解决方案。获取新用户与昨天相比

模式: -

Table Strcutre: Session_ID, session_day, user_id, product_id 

我已经试过?

SELECT session_day, 
     session_count, 
     user_count - LAG(user_count, 1) OVER (ORDER BY session_day) AS gain_loss_users 
    FROM 
    (
     SELECT session_day, 
       COUNT(session_id) AS session_count, 
       COUNT(user_id) user_count 
      FROM user_traffic 
     GROUP BY session_day 
    ) X ; 
+0

小艾固体给我... – JohnHC

+1

什么标识一个客户的“新”或“丢失” - 只在基于你提出的四个表格列? – mathguy

+0

没有其他方法来确定用户是第一次还是返回用户。问题中的部分“新”使我感到困惑...... – Teja

回答

1

我试图解决“新”和“返回”人的问题。这里是我的尝试:

select session_day, 
     COUNT(distinct user_id) AS user_cnt, 
     count(distinct user_id) - lag(count(distinct user_id)) 
            over (order by session_day) gain, 
     count(newu) AS newu, count(returnu) AS returnu 
    from (
      select session_id, 
       session_day, 
       user_id, 
       CASE WHEN 
       count(*) over (partition by user_id ORDER BY session_day,session_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          = 1 
         THEN 1 
        END 
        AS newu, 
       CASE WHEN 
       lag(session_day,1) over (partition by user_id ORDER BY session_day,session_id) 
          <> 
          lag(session_day,1) over (order by session_day,session_id) 
         THEN 1 
       END AS returnu  
      from user_traffic u 
     ) 
    group by session_day 
    order by session_day; 

测试数据和输出:

create table user_traffic (session_id number(6), session_day date, 
          user_id number(6), product_id number(6)); 

insert into user_traffic values ( 1, date '2016-09-07', 101, 1); 
insert into user_traffic values ( 2, date '2016-09-07', 101, 4); 
insert into user_traffic values ( 3, date '2016-09-07', 102, 1); 
insert into user_traffic values ( 4, date '2016-09-08', 101, 2); 
insert into user_traffic values ( 5, date '2016-09-08', 101, 4); 
insert into user_traffic values ( 6, date '2016-09-09', 102, 1); 
insert into user_traffic values ( 7, date '2016-09-10', 102, 1); 
insert into user_traffic values ( 8, date '2016-09-10', 103, 3); 

SESSION_DAY  CNT  GAIN  NEW RETURNS 
----------- ---------- ---------- ---------- ---------- 
2016-09-07   2      2   0 -- 101 & 102 are new 
2016-09-08   1   -1   0   0 
2016-09-09   1   0   0   1 -- 102 returned 
2016-09-10   2   1   1   0 -- 103 is new 
+0

这看起来很不错。但想补充你的答案。无界前置和当前行之间的行。不确定您使用哪个数据库来生成此输出。 – Teja

0

没有一个更好的方式,但还有一个更简洁的方式。你可以用聚合函数混合窗口功能:

SELECT session_day, 
      COUNT(session_id) as session_count, 
      COUNT(DISTINCT user_id) as user_count, 
      (COUNT(DISTINCT user_id) - 
      LAG(COUNT(DISTINCT user_id)) OVER (ORDER BY session_day) 
     ) as gain_loss_users 
     FROM user_traffic 
    GROUP BY session_day; 

我假设你想COUNT(DISTINCT)因为:(1)用户可以具有在同一天和(2)的两项罪名是相同的多个会话(如果user_idsession_id从不是NULL)。

+0

需要从LAG()删除“PARTITION BY session_day”不应该被分区,因为它是按顺序排列的,并且查询已经在其上分组了。如果留在滞后结果中,则在sql-sever中为NULL – Matt

+1

@Matt。 。 。谢谢。 –

+0

如何获得新用户的损益数字? – Teja