2017-01-16 51 views
1

我有这种格式滚动平均失踪一个月的情况下,Teradata的

student_id,month1,fees 
A1,201612,22 
A1,201611,33 
A1,201610,44 
A1,201609,55 
A1,201608,66 
A1,201607,77 
A1,201606,88 
A2,201612,12 
A2,201610,24 
A2,201609,36 
A2,201607,48 

我希望每一个学生考虑的最后三个月份的费用平均费用数据意味着学生A1,为一个月201612,手续费会总和(22,33,44)/ 3,所以我用这个查询

(select student_id,month1,fees,(sum(fees) over(partition by 
student_id 
order by 
student_id 
, 
month1 
asc rows between 2 preceding and current row))/3 as avg1 from table where 
month1 
>(select trim(Add_Months(cast(trim(maxrepmonth) as DATE Format 'YYYYMM'),-5) (format 'YYYYMM')) from (select max(
month1 
) as maxrepmonth from table) z2) group by 1,2,3) 

并且它是具有全部个月的数据,但在学生A2的情况下,一个月201612,它正在这工作正常学生A1这些月份的费用201612,201610,201609这是错误的,而应该只从201612,201610开始,因为201611错过了ING。 请帮忙。

感谢

从ANSI-99标准
+0

好方案,Teradata不支持RANGE窗口子句是一个惊喜给我。应该首先阅读文档...但是我刚刚发布了一个可能的解决方法1分钟前 – marcothesane

回答

0

OLAP功能,在这里你的朋友 - 尤其是RANGE窗口子句。

试试这个 - 我与Vertica的做,但Teradata数据应该只是作为ANSI标准,为你做它,太:

WITH foo(student_id,month1,fees) AS (
         SELECT 'A1',DATE '2016-12-01',22 
    UNION ALL SELECT 'A1',DATE '2016-11-01',33 
    UNION ALL SELECT 'A1',DATE '2016-10-01',44 
    UNION ALL SELECT 'A1',DATE '2016-09-01',55 
    UNION ALL SELECT 'A1',DATE '2016-08-01',66 
    UNION ALL SELECT 'A1',DATE '2016-07-01',77 
    UNION ALL SELECT 'A1',DATE '2016-06-01',88 
    UNION ALL SELECT 'A2',DATE '2016-12-01',12 
    UNION ALL SELECT 'A2',DATE '2016-10-01',24 
    UNION ALL SELECT 'A2',DATE '2016-09-01',36 
    UNION ALL SELECT 'A2',DATE '2016-07-01',48 
    ) 
    SELECT 
     * 
    , AVG(fees) OVER (
      PARTITION BY student_id ORDER BY month1 
      RANGE BETWEEN INTERVAL '3 MONTHS' PRECEDING AND CURRENT ROW 
     ) AS rolling_avg_3_months 
    FROM foo; 

student_id|month1 |fees|rolling_avg_3_months 
A1  |2016-06-01| 88|     88 
A1  |2016-07-01| 77|    82.5 
A1  |2016-08-01| 66|     77 
A1  |2016-09-01| 55|     66 
A1  |2016-10-01| 44|     55 
A1  |2016-11-01| 33|     44 
A1  |2016-12-01| 22|     33 
A2  |2016-07-01| 48|     48 
A2  |2016-09-01| 36|     42 
A2  |2016-10-01| 24|     30 
A2  |2016-12-01| 12|     18 
+0

Teradata不支持RANGE –

+0

嗨,它出现错误“期望单词重置或')'后按顺序排列”它不支持范围在teradata中的间隔。 –

0

感谢嘟嘟,感谢阿米特

所以Teradata的呢不支持范围...

现在它变得更加棘手。

找到了一个工作解决方案,但它需要一些解释。

希望评论够了。

WITH 
-- the input data 
    foo(student_id,month1,fees) AS (
      SELECT 'A1',DATE '2016-12-01',22 
UNION ALL SELECT 'A1',DATE '2016-11-01',33 
UNION ALL SELECT 'A1',DATE '2016-10-01',44 
UNION ALL SELECT 'A1',DATE '2016-09-01',55 
UNION ALL SELECT 'A1',DATE '2016-08-01',66 
UNION ALL SELECT 'A1',DATE '2016-07-01',77 
UNION ALL SELECT 'A1',DATE '2016-06-01',88 
UNION ALL SELECT 'A2',DATE '2016-12-01',12 
UNION ALL SELECT 'A2',DATE '2016-10-01',24 
UNION ALL SELECT 'A2',DATE '2016-09-01',36 
UNION ALL SELECT 'A2',DATE '2016-07-01',48 
) 
-- add two columns fees1before and fees2before, that can be null, 
-- containing the fees of the two previous rows if the 'month1' 
-- value of those rows is less than 3 months back 
, foo_3_months AS (
SELECT 
    student_id 
, month1 
, fees AS fees_now 
, CASE 
    WHEN 
     MONTHS_BETWEEN(month1,LAG(month1) OVER (PARTITION BY student_id ORDER BY month1)) 
    < 3 
    THEN LAG(fees) OVER (PARTITION BY student_id ORDER BY month1) 
    END AS fees_1before 
, CASE 
    WHEN MONTHS_BETWEEN(month1,LAG(month1,2) OVER (PARTITION BY student_id ORDER BY month1)) 
    < 3 
    THEN LAG(fees,2) OVER (PARTITION BY student_id ORDER BY month1) 
    END AS fees_2before 
    FROM foo 
) 
-- finally, build a hard-wired average formula that takes care of 
-- the fact that two of the three values can be NULL 
-- I'm keeping the two additional columns for debugging purposes. 
-- They can be removed in the end. 
SELECT 
    * 
, (fees_now+NVL(fees_1before,0)+NVL(fees_2before,0)) 
/(
    1 
    + (CASE WHEN fees_1before IS NOT NULL THEN 1 ELSE 0 END) 
    + (CASE WHEN fees_2before IS NOT NULL THEN 1 ELSE 0 END) 
) 
AS rolling_avg_3months 
FROM foo_3_months 
; 

这里的结果:

student_id|month1 |fees_now|fees_1before|fees_2before|rolling_avg_3months 
A1  |2016-06-01|  88|-   |-   |88.000000000000000000 
A1  |2016-07-01|  77|   88|-   |82.500000000000000000 
A1  |2016-08-01|  66|   77|   88|77.000000000000000000 
A1  |2016-09-01|  55|   66|   77|66.000000000000000000 
A1  |2016-10-01|  44|   55|   66|55.000000000000000000 
A1  |2016-11-01|  33|   44|   55|44.000000000000000000 
A1  |2016-12-01|  22|   33|   44|33.000000000000000000 
A2  |2016-07-01|  48|-   |-   |48.000000000000000000 
A2  |2016-09-01|  36|   48|-   |42.000000000000000000 
A2  |2016-10-01|  24|   36|-   |30.000000000000000000 
A2  |2016-12-01|  12|   24|-   |18.000000000000000000 

不是一件容易的事 - 也许对于增强到Teradata的请求?

玩的开心 - 马尔科理智

0

我刚刚完成处理您需要的第二种方式。但花了我两个小时,我不得不找时间做到这一点。但我自己很好奇,所以没有浪费时间。

这种方法的优点是它更加灵活。如果您需要4,5或6个月而不是3个时间点,则更容易进行更改,并且您不必考虑组件为NULL的可能性,因为您可以使用正常的AVG()OVER()。

缺点是更复杂的数据准备阶段:您必须填充包含度量NULL的空白,并创建基表中最小的month1和最大的month1值之间的所有可能的第一个月的列表。 。为此,我模仿Vertica的TIMESERIES子句。

该解决方案包含很多我认为在任何人深入挖掘SQL的生存工具包中都很有用的方法,比如创建连续整数的内联表,以及时间序列。这也是为什么我创建一系列100个整数时,7就足够了。它还表明CROSS JOIN并不总是一场灾难。

我试图充分评论我在这里做什么,我希望这已经足够。

-- WITHOUT RANGE BETWEEN - vertical version 
WITH 
-- the input data 
    foo(student_id,month1,fees) AS (
      SELECT 'A1',DATE '2016-12-01',22 
UNION ALL SELECT 'A1',DATE '2016-11-01',33 
UNION ALL SELECT 'A1',DATE '2016-10-01',44 
UNION ALL SELECT 'A1',DATE '2016-09-01',55 
UNION ALL SELECT 'A1',DATE '2016-08-01',66 
UNION ALL SELECT 'A1',DATE '2016-07-01',77 
UNION ALL SELECT 'A1',DATE '2016-06-01',88 
UNION ALL SELECT 'A2',DATE '2016-12-01',12 
UNION ALL SELECT 'A2',DATE '2016-10-01',24 
UNION ALL SELECT 'A2',DATE '2016-09-01',36 
UNION ALL SELECT 'A2',DATE '2016-07-01',48 
) 
-- 1. Mimick Vertica's TIMESERIES clause: 
-- Prepare a series of month-start dates 
-- from the first month to the last month 
-- of the time series. Assuming it's more than 
-- 10 months: 
-- 1.a A series of 100 ints starting from 0 
-- 1.a.1 start with 10 ints 
, ten_ints(idx) AS (
      SELECT 0 
UNION ALL SELECT 1 
UNION ALL SELECT 2 
UNION ALL SELECT 3 
UNION ALL SELECT 4 
UNION ALL SELECT 5 
UNION ALL SELECT 6 
UNION ALL SELECT 7 
UNION ALL SELECT 8 
UNION ALL SELECT 9 
) 
-- 1.a.2 make 100 out of 10 
, idx_series AS (
SELECT 
    tens.idx * 10 + units.idx AS idx 
FROM ten_ints units 
CROSS JOIN ten_ints tens 
) 
-- 1.b get limit dates and total month count 
, month_limits AS (
SELECT 
    MIN(month1) AS start_month 
, MAX(month1) AS end_month 
, MONTHS_BETWEEN(MAX(month1), MIN(month1)) AS monthcount 
FROM foo 
) 
-- 1.c create an artificial list of all student_id 
--  and all possible months to fill gaps 
--  This is the end of the TIMESERIES mimick. 
, student_month_list AS (
SELECT 
    student_id 
, ADD_MONTHS(start_month,idx) AS month1 
FROM month_limits 
JOIN idx_series 
    ON idx <= monthcount 
CROSS 
    JOIN (
    SELECT DISTINCT student_id FROM foo 
) bar 
) 
-- This returns: 
-- student_id|month1 
-- A1  |2016-06-01 
-- A1  |2016-07-01 
-- A1  |2016-08-01 
-- A1  |2016-09-01 
-- A1  |2016-10-01 
-- A1  |2016-11-01 
-- A1  |2016-12-01 
-- A2  |2016-06-01 
-- A2  |2016-07-01 
-- A2  |2016-08-01 
-- A2  |2016-09-01 
-- A2  |2016-10-01 
-- A2  |2016-11-01 
-- A2  |2016-12-01 

-- Main query: 
-- left join student_month_list to the base table 
-- and filter out the rows whose measure is NULL 
SELECT 
    mth.student_id 
, mth.month1 
, AVG(foo.fees) OVER (
    PARTITION BY mth.student_id ORDER BY mth.month1 
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW 
) AS running_avg_3months 
FROM student_month_list mth 
LEFT JOIN foo USING(student_id, month1) 
WHERE foo.fees IS NOT NULL 
ORDER BY 1,2 
; 
相关问题