2017-01-31 36 views
0

我们的数据仓库中有两个表:c_customers和h_customers,包含当前和历史客户记录。统计关键日期的历史记录

两个表有一个c_customers 'DWH_FROM' 和 'DWH_TO' 专栏,所有记录有 'DWH_TO'= NULL。

的PK c_customers是CUST_NR,而对于h_customers它的CUST_NR,DWH_FROM和DWH_TO。

当客户数据发生变化时,会将新记录插入到c_customers中,并带有空白DWH_TO值,而旧记录将移至带有包含更改发生日期的DWH_TO的h_customers。

我怎么能得到多少客户(不同CUST_NR)有STATUS =“活动”作为第一个2016年每月的,或者用于在2016年每日期列表?

理想的输出会是这样的:

Date  | Count 
-----------+------ 
01.01.2016 | 22385 
01.02.2016 | 23187 
...  | 
01.12.2016 | 25109 

我就来为生成数据集:

SELECT * 
FROM (SELECT CUST_NR, 
      STATUS, 
      DWH_FROM, 
      DWH_TO 
     FROM C_CUSTOMER C 
     UNION ALL 
     SELECT CUST_NR, 
      STATUS, 
      DWH_FROM, 
      DWH_TO 
     FROM H_CUSTOMER H 
    ); 

...但我真的不知道怎么算客户在某个日期,多个日期。

+0

你想分享你有什么迄今所做,输入表格等的样本数据? –

+0

假设特定客户在历史表中不会有**重叠间隔是否公平?他们可能会出现不止一次,但不会超过一次*在同一日期*? (与他们目前的记录没有重叠 - 所以他们不会在同一天的两个表中) – mathguy

+0

这是正确的。 – igneous

回答

0

这是一个蛮力方法。你可以做一个日期做到这一点:

select c.cnt + h.cnt 
from (select count(*) as cnt 
     from c_customer c 
     where date '2016-01-01' <= c.dw_to 
    ) c cross join 
    (select count(*) as cnt 
     from h_customer c 
     where date '2016-01-01' between c.dw_to and c.dw_from 
    ) h; 

可以适应这种使用相关子查询:

select d.dte, 
     ((select count(*) as cnt 
      from c_customer c 
      where date d.dte <= c.dw_to 
     ) + 
     (select count(*) as cnt 
      from h_customer c 
      where date d.dte between c.dw_to and c.dw_from 
     ) 
     ) as cnt 
from (select date '2016-01-01' as dte from dual union all 
     select date '2016-02-01' as dte from dual union all 
     select date '2016-03-01' as dte from dual union all 
     . . . 
    ) d; 

这不是解决这一问题的唯一途径。但是对于少数日期,在性能方面应该没问题。

0

性能确实是这个问题的真正问题。如果你有一个日期表,你可以做它一个完整的连接,并使用一些查询如下:

WITH dates AS 
(SELECT '2016-01-01' AS dateid 
UNION ALL SELECT '2016-02-01' 
UNION ALL SELECT '2016-03-01' 
UNION ALL SELECT '2016-04-01' 
UNION ALL SELECT '2016-05-01' 
UNION ALL SELECT '2016-06-01' 
UNION ALL SELECT '2016-07-01' 
UNION ALL SELECT '2016-08-01' 
UNION ALL SELECT '2016-09-01' 
UNION ALL SELECT '2016-10-01' 
UNION ALL SELECT '2016-11-01' 
UNION ALL SELECT '2016-12-01' 
) 

,c_cust AS 
    (SELECT 1 AS CustNr, 'a' AS name, '2014-01-01' AS DWH_FROM, NULL AS DWH_TO 
    UNION ALL SELECT 2,'b', '2015-01-01', NULL 
    UNION ALL SELECT 3,'c', '2016-01-01', NULL 
    UNION ALL SELECT 5,'e', '2016-04-01', NULL 
    UNION ALL SELECT 6,'f', '2016-06-01', NULL 
    ) 

, h_cust AS 
    (SELECT 10 AS CustNr, 'j' AS name, '2010-01-01' AS DWH_FROM, '2010-12-31' AS DWH_TO 
    UNION ALL SELECT 12,'k', '2015-01-01', '2016-12-31' 
    UNION ALL SELECT 15,'m', '2016-01-01', '2016-06-31' 
    UNION ALL SELECT 20,'p', '2014-01-01', '2016-03-31' 
    UNION ALL SELECT 26,'r', '2015-01-01', '2015-12-31' 
    ) 
,all_cust AS 
(
    SELECT * FROM c_cust c 
    UNION ALL SELECT * FROM h_cust h 
) 

SELECT d.dateid, COUNT(*) AS ActiveUsers 
FROM all_cust c 
,dates d 
WHERE d.dateid > c.DWH_FROM AND d.dateid < ISNULL(c.DWH_TO, '9999-12-31') 
GROUP BY d.dateid 

你得到的结果是:

dateid ActiveUsers 
2016-01-01 4 
2016-02-01 6 
2016-03-01 6 
2016-04-01 5 
2016-05-01 6 
2016-06-01 6 
2016-07-01 6 
2016-08-01 6 
2016-09-01 6 
2016-10-01 6 
2016-11-01 6 
2016-12-01 6 
0

这里是解决这一问题的有效途径。

某处,您需要创建报告所需的所有日期(2016年每月的第一个月)。我在解决方案中命名为mth的分层(子)查询中执行此操作。

在下面我创建在with子句测试数据中的代码;该数据不是解决方案的一部分(应在对实际表格使用之前将其删除)。我没有使用你的表名 - 我只创建了与这个练习相关的列。

将子列名置于子查询声明中,就像我在with子句中所做的那样,是Oracle 11.2中的一项新功能;如果您使用的是旧版本,则需要将列名移动到每个子查询定义中。如果需要,这是一个微不足道的变化。

策略是使用适当的连接条件将“月份”或“日历”表(包含12个月份首月的日期)加入到“当前”和“历史”客户表中的每一个表中为每个。用UNION ALL收集结果(这是可能的,因为在每次加入时,我们需要保留的是“日历”日期,每当有一个客户表中的行匹配时,为月份的首个月份)。那么按照日期和计数分组就很简单了。

with 
    curr_cust (custnr, dwh_from) as ( 
    select 101, date '2013-10-15' from dual union all 
    select 102, date '2016-03-11' from dual union all 
    select 105, date '2015-04-02' from dual union all 
    select 113, date '2016-12-15' from dual 
    ), 
    hist_cust (custnr, dwh_from, dwh_to) as (
     select 100, date '2014-12-01', date '2015-12-20' from dual union all 
     select 102, date '2015-11-15', date '2016-02-08' from dual union all 
     select 108, date '2016-03-01', date '2016-08-03' from dual union all 
     select 108, date '2016-10-15', date '2016-12-15' from dual 
    ), 
    mth (dt) as (
     select add_months(date '2016-01-01', level - 1) from dual 
     connect by level <= 12 
    ) 
select to_char(dt, 'yyyy-mm-dd') as dt, count(*) as cust_count 
from  (select dt 
      from mth m join curr_cust c on m.dt >= c.dwh_from 
      union all 
      select dt 
      from mth m join hist_cust h on m.dt between h.dwh_from and h.dwh_to 
     ) 
group by dt 
order by dt -- if needed 
; 

输出(与包含在查询中的测试数据):

DT   CUST_COUNT 
---------- ---------- 
2016-01-01   3 
2016-02-01   3 
2016-03-01   3 
2016-04-01   4 
2016-05-01   4 
2016-06-01   4 
2016-07-01   4 
2016-08-01   4 
2016-09-01   3 
2016-10-01   3 
2016-11-01   4 
2016-12-01   4 

12 rows selected.