填补

2017-08-09 53 views
3

失踪前值日期比方说,我有以下data.table填补

library(data.table) 
set.seed(123) 
df <- as.data.table(data.frame(date = c("2017-01-01", "2017-01-05", "2017-01-08", "2017-01-01", "2017-01-05", "2017-01-08"), 
       value = rnorm(6), 
       mygroup = rep(LETTERS[1:2], each = 3))) 

我要填写与组“最后”值缺少的日期。我找到的最接近的是this question,它显示了如何在不分组的情况下完成此操作。

all_dates <- seq(from = as.Date("2017-01-01"), 
        to = as.Date("2017-01-08"), 
        by = "days") 

df[J(all_dates), roll=Inf] 

然而,我需要由组中的错误要做到这一点,并使用by结果

错误[.data.table(DF,J(all_dates),辊=天道酬勤,通过= MYGROUP): “通过”或“keyby”提供但不Ĵ

+0

请不要'DF [日期:= as.Date(日期)]',而比要求的日期打字无数次。无论如何,我想'df [df [,。(date = seq(first(date),last(date),by =“day”)),by = mygroup] on =。(mygroup,date),roll = -Inf]'可以做到..? – Frank

+0

是否每个团队都有不同的日期范围,或者每个团队的日期范围是否相同(本例中的Jan 1-8)?在后一种情况下,有一些使用CJ的几乎愚弄,如https://stackoverflow.com/a/10473931/ – Frank

+0

@Frank每个组可能有不同的范围。您最初的建议目前导致错误的'on'参数应该是一个已命名的原子向量oc列名称,指示'i'中的哪些列应该与'x'中的哪些列结合在一起。' – cdeterman

回答

3

我们可以添加mygroup在滚动另一列加入:

df[, date := as.Date(date)] 

df[ 
    df[, .(date = seq(first(date), last(date), by="day")), by=mygroup], 
    on=.(mygroup, date), 
    roll=TRUE] 

      date  value mygroup 
1: 2017-01-01 -0.56047565  A 
2: 2017-01-02 -0.56047565  A 
3: 2017-01-03 -0.56047565  A 
4: 2017-01-04 -0.56047565  A 
5: 2017-01-05 -0.23017749  A 
6: 2017-01-06 -0.23017749  A 
7: 2017-01-07 -0.23017749  A 
8: 2017-01-08 1.55870831  A 
9: 2017-01-01 0.07050839  B 
10: 2017-01-02 0.07050839  B 
11: 2017-01-03 0.07050839  B 
12: 2017-01-04 0.07050839  B 
13: 2017-01-05 0.12928774  B 
14: 2017-01-06 0.12928774  B 
15: 2017-01-07 0.12928774  B 
16: 2017-01-08 1.71506499  B 

“滚动”总是发生在最后一列on=


如果表中有多个列,我们只是想补回他们中的一些...

# extend example 
set.seed(1) 
df[, y := rpois(.N, 1)] 

# build new table 
newDT = df[, .(date = seq(first(date), last(date), by="day")), by=mygroup] 

roll_cols = "value" 
newDT[, (roll_cols) := 
    df[newDT, on=.(mygroup, date), roll=TRUE, mget(paste0("x.", roll_cols))]] 

noroll_cols = "y" 
newDT[df, on=.(mygroup, date), (noroll_cols) := mget(paste0("i.", noroll_cols)) ] 

    mygroup  date  value y 
1:  A 2017-01-01 -0.56047565 0 
2:  A 2017-01-02 -0.56047565 NA 
3:  A 2017-01-03 -0.56047565 NA 
4:  A 2017-01-04 -0.56047565 NA 
5:  A 2017-01-05 -0.23017749 1 
6:  A 2017-01-06 -0.23017749 NA 
7:  A 2017-01-07 -0.23017749 NA 
8:  A 2017-01-08 1.55870831 1 
9:  B 2017-01-01 0.07050839 2 
10:  B 2017-01-02 0.07050839 NA 
11:  B 2017-01-03 0.07050839 NA 
12:  B 2017-01-04 0.07050839 NA 
13:  B 2017-01-05 0.12928774 0 
14:  B 2017-01-06 0.12928774 NA 
15:  B 2017-01-07 0.12928774 NA 
16:  B 2017-01-08 1.71506499 2 
+0

这是非常关闭,但我想从先前的值中填充它(例如'2017-01-02''value'应该是'-0.56047565', – cdeterman

+0

啊啊,我误读了。切换到'roll = TRUE'应该这样做。我会修复,帮助我们看到所需的输出,fyi。 – Frank