2014-11-20 120 views
0

我有以下几点:重新激活SQL

with t as (
     SELECT advertisable, EXTRACT(YEAR from day) as yy, EXTRACT(MONTH from day) as mon, 
      ROUND(SUM(cost)/1e6) as val 
     FROM adcube dac 
     WHERE advertisable IN (SELECT advertisable 
           FROM adcube dac 
           GROUP BY advertisable 
           HAVING SUM(cost)/1e6 > 100 
           ) 
     GROUP BY advertisable, EXTRACT(YEAR from day), EXTRACT(MONTH from day) 
    ) 
select advertisable, min(yy * 10000 + mon) as yyyymm 
from (select t.*, 
      (row_number() over (partition by advertisable order by yy, mon) - 
       row_number() over (partition by advertisable, val order by yy, mon) 
      ) as grp 
     from t 
    )as foo 
group by advertisable, grp, val 
having count(*) >= 6 and val = 0 
; 

这会跟踪站花了4个月的帐户的激活日期。不过,我想要追踪重新激活日期。因此,如果帐户在4个月后再次开始支出,我可以看到该帐户的新开始日期?

+0

请,*总是*为您正在使用的表的表定义。还有你的Postgres版本。 – 2014-11-20 18:10:06

回答

1

您想查找val > 0以及有4个(或6个)前面0记录的帐户。

这里有一个想法:

  • 计算类似值的组为您的查询。
  • 为每个组分配一个序号(val_seqnum)。
  • 然后拉出每个记录的前一个值和序列号。

现在,你想要的记录,其中符合下列条件:

  • val > 0
  • prev_val = 0
  • 以前val_seqnum >= 4(或任何你的阈值)。

以下查询应该这样做(假设t含义相同):

select t.* 
from (select t.* , 
      lag(val) over (partition by advertisable order by yy, mon) prev_val, 
      lag(val_seqnum) over (partition by advertisable order by yy, mon) as prev_val_seqnum 
     from (select t.*, 
        row_number() over (partition by advertisable, val, grp order by yy, mon) as val_seqnum 
       ) as grp 
      from (select t.*, 
         (row_number() over (partition by advertisable order by yy, mon) - 
          row_number() over (partition by advertisable, val order by yy, mon) 
         ) as grp 
        from t 
       ) t 
      ) t 
    ) t 
where val > 0 and prev_val = 0 and prev_val_seqnum >= 4; 
1

我认为这是可以根本上简单(快):

SELECT advertisable, ym AS reactivation_ym 
FROM (
    SELECT advertisable 
     , date_trunc('month', day) AS ym 
     , SUM(cost) < 500000  AS asleep 
     , count(SUM(cost) < 500000 OR NULL) 
       OVER (PARTITION BY advertisable 
         ORDER BY date_trunc('month', day) 
         ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) AS ct 
    FROM adcube dac 
    JOIN (
     SELECT advertisable 
     FROM adcube 
     GROUP BY 1 
     HAVING SUM(cost) > 1e8 -- really 10000000 ? 
    ) x USING (advertisable) 
    GROUP BY 1, 2 
    ) sub 
WHERE NOT asleep 
AND ct = 4; 

大厦根据一些假设来填补缺失的信息。
我基本上解开了你的计算,简化了代码,使它比你的原始代码更短更快。

  • 计算每个advertisable多少的最后4个月总共有cost低于50万。只有低于阈值的所有4个(现有)个月,该行资格。 (如果您没有为所有月份行,你需要决定如何处理缺失行。信息不可用在你的问题。)

使用count()与定制框架窗口聚合函数。下面是最近相关答案有详细的解释:

你怎么能 “鸟巢” count()sum()
它们并不真正嵌套。这是一个聚合函数的窗口函数。详细信息:

+0

感谢这是更快,所以当你说'行间4先行和1先行'这是检查4个月没有花,然后1个月后,看看是否仍然0? – user3207341 2014-11-21 10:38:38

+0

@ user3207341:实际上,检查* current *行是否再次花费。修订。 – 2014-11-21 13:50:41