2017-10-16 51 views
2

运行总和我有下面的所有列的表,除了黄色的有间隙

enter image description here

基本上表有客户的ID,出售发生的日期和总金额的花费当天的客户(销售)。现在我必须计算当天每个客户的时间范围内的累计销售额,包括当天的销售额。例如,设置时间框架为3天的客户2233买了两次(14日没有),所以他15日的累计销售额是26,而在13日他们是25.

我不能创建新表所以我试图这种方法,但它是相当缓慢

SELECT t.dt, 

Count(CASE WHEN t.running_sale < 1.99 THEN 1 ELSE NULL END) as "Low spender", 
Count(CASE WHEN t.running_sale BETWEEN 1.99 and 4.99 THEN 1 ELSE NULL END) as "Medium spender", 
Count(CASE WHEN t.running_sale > 4.99 THEN 1 ELSE NULL END) as "High spender" 

FROM (SELECT dt, channel, id, (
    SELECT SUM(revenue) 
    FROM myTable rd 
    WHERE CAST(rd.dt AS DATE) 
      BETWEEN (CAST(rd.dt AS DATE) - INTERVAL '3' DAY) AND CAST(rd.dt AS DATE) AND 
      rd.id = r.id 
) running_sale from myTable r) t 

WHERE channel = 'retail' 
AND dt BETWEEN '2017-06-01' AND '2017-06-15' 

GROUP BY dt 
limit 100 
+0

使用分析? '总和(销售)OVER(分区由ID ORDER BY日期asc 行之间2 PRECEDING)作为RunningSales' – xQbert

+0

不起作用,因为将在第12天采取ID 2233将需要11和06,这是一个差距超过3天。 –

+0

我有点得到它,但我不明白为什么2233在15日有26,那么如果范围是3天前包括15,14,13这将给22不是26.或者应该包括12,所以范围是15,13,​​13,12? – xQbert

回答

2

我会用一个子查询这个

select *, 
    (
    select sum(sales) 
    from your_table dd 
    where cast(dd.dates as date) 
      between cast(your_table.dates as date) - interval '3' day and 
        cast(your_table.dates as date) and 
      dd.id = your_table.id 
) running_sales 
from your_table 

demo

和上面的查询可以改写成简单的使用更有效的对口自联接和group by

select dd.id, dd.dates, dd.sales, sum(d.sales) running_sales 
from your_table dd 
join your_table d on cast(d.dates as date) 
     between (cast(dd.dates as date) - interval '3' day) and cast(dd.dates as date) and 
     dd.id = d.id 
group by dd.id, dd.dates, dd.sales 

group by demo

您可以考虑设立以下指标来支持上述查询:

create index ix_your_table on your_table(id, dates, sales) 
+0

用我必须做的全部更新来更新我的问题。看起来像这种方法是缓慢的,服务器超时 –

+0

@PasqualeSada好吧,我已经改写成一个'group by'版本,请现在测试它,让我知道 –

0
With CTE as (
    SELECT 1234 id, '2017-06-15' idate,9 sales from dual UNION ALL 
    SELECT 2233 id, '2017-06-03' idate,20 sales from dual UNION ALL 
    SELECT 2233 id , '2017-06-05' idate,4 sales from dual UNION ALL 
    SELECT 2233 id , '2017-06-06' idate,1 sales from dual UNION ALL 
    SELECT 2233 id , '2017-06-11' idate,8 sales from dual UNION ALL 
    SELECT 2233 id , '2017-06-12' idate,4 sales from dual UNION ALL 
    SELECT 2233 id, '2017-06-13' idate,21 sales from dual UNION ALL 
    SELECT 2233 id, '2017-06-15' idate,1 sales from dual UNION ALL 
    SELECT 2544 id , '2017-06-13' idate,9 sales from dual UNION ALL 
    SELECT 2443 id, '2017-06-05' idate,3.5 sales from dual) 

,cte2 as (
select cte.*, to_number(replace(idate,'-')) datekey from cte 
) 
--select * from cte2 
--SELECT cte.*, sum(cte.Sales) OVER (PARTITION by ID ORDER BY cte.iDate asc ROWS 2 PRECEDING) as RunningSales FROM CTE 

--select rownum rn from dual connect by prior 
,pp as (
SELECT to_number(dd+20170600) dkey 
FROM (SELECT rownum dd 
     FROM dual 
     CONNECT BY LEVEL <= 31 
     ) 
) 
--select * from pp 
,cc as (


select cte2.* ,pp.dkey 
from pp left join cte2 
on(cte2.datekey=pp.dkey) 
) 
select cc.* ,sum(cc.Sales) OVER (PARTITION by cc.ID ORDER BY cc.dkey asc ROWS 2 PRECEDING) as RunningSales 
from cc order by dkey asc ,id asc 
+0

它在oracle 12c上测试它。并且它可以从构建数据湖维度中借鉴。 – Mookayama

0

如果每天至多有一笔销售,那么最有效的方法可能会重复延迟:

select rd.*, 
     (sales + 
     (case when prev_date >= date - interval '2 day' then prev_sales else 0 end) + 
     (case when prev2_date >= date - interval '2 day' then prev2_sales else 0 end) 
     ) as sales_3day 
from (select rd.*, 
      lag(date, 1) over (partition by id order by date) as prev_date, 
      lag(date, 2) over (partition by id order by date) as prev_date2, 
      lag(sales, 1) over (partition by id order by date) as prev_sales, 
      lag(sales, 2) over (partition by id order by date) as prev_sales2 
     from mytable rd 
) rd; 

一旦你有了这个值,剩下的只是结果的条件逻辑。

如果您在一个日期有多个销售额,则可以通过在最内层查询中进行汇总来轻松完成此项工作。