2015-09-18 56 views
4

我想计算每个网格平方下降过去三天的降雨量,并将其作为新数据列添加到我的数据中。表。需要明确的是,我要总结降雨当前和以前的两(2)天,每个气象方格计算R data.table中前3行的总和(用网格平方)

library (zoo) 
library (data.table) 


# making the data.table 
rain   <- c(NA, NA, NA, 0, 0, 5, 1, 0, 3, 10) # rainfall values to work with 
square   <- c(1,1,1,1,1,1,1,1,1,2)    # the geographic grid square for the rainfall measurement 
desired_result <- c(NA, NA, NA, NA, NA, 5, 6, 6, 4, NA) # this is the result I'm looking for (the last NA as we are now on to the first day of the second grid square) 
weather <- data.table(rain, square, desired_result) # making the data.table 

我试图回答:这条线来工作,但不再做

weather[, rain_3 := filter(rain, rep(1, 2), sides = 1), by = list(square)] 

所以在这里我想另一种方法:

# this next line gets the numbers right, but sums the following values, not the preceeding ones. 
weather$rain_3 <- rollapply(zoo(weather$rain), list(seq(-2,0)), sum) 

# here I add in the by weather$ square, but still no success 
weather$rain_3 <- rollapply(zoo(weather$rain), list(seq(-2,0)), sum, by= list(weather$square)) 

我将不胜感激任何见解,或者您可能有任何建议。

非常感谢!

回答

2
weather[, rain_3 := filter(rain, rep(1, 3), sides = 1), by = list(square)] 
#Error in filter(rain, rep(1, 3), sides = 1) : 
# 'filter' is longer than time series 
weather[, rain_3 := if(.N > 2) filter(rain, rep(1, 3), sides = 1) else NA_real_, 
     by = square] 
# rain square desired_result rain_3 
# 1: NA  1    NA  NA 
# 2: NA  1    NA  NA 
# 3: NA  1    NA  NA 
# 4: 0  1    NA  NA 
# 5: 0  1    NA  NA 
# 6: 5  1    5  5 
# 7: 1  1    6  6 
# 8: 0  1    6  6 
# 9: 3  1    4  4 
#10: 10  2    NA  NA 

小心dplyr没有加载,因为它掩盖了filter。如果您需要dplyr,则可以明确呼叫stats::filter

+1

你是当场就在于它是dplyr导致该问题。通过使用'detach(“package:dplyr”,unload = TRUE)'我的原始代码再次开始工作。我非常感谢你的洞察力。 – threeisles

17

下面是一个使用最新版本data.table(V 1.9.6+)

weather[, rain_3 := Reduce(`+`, shift(rain, 0:2)), by = square] 
weather 
#  rain square desired_result rain_3 
# 1: NA  1    NA  NA 
# 2: NA  1    NA  NA 
# 3: NA  1    NA  NA 
# 4: 0  1    NA  NA 
# 5: 0  1    NA  NA 
# 6: 5  1    5  5 
# 7: 1  1    6  6 
# 8: 0  1    6  6 
# 9: 3  1    4  4 
# 10: 10  2    NA  NA 

这里的基本思路是shiftrain柱两次,然后总结行的快速和有效的解决方案。

+1

这里是减少<3,+1 –

2

你自己几乎得到了答案。 rollsum(或你的案例中的rollapply)为您提供长度为N-2的矢量,因此您只需用NAs填充所需的单元格。它可以这样简单地完成:roll<-c(NA,NA,rollsum(yourvector,k=3))

这是我如何做到这一点。我使用{RcppRoll}包中的roll_sum,因为它速度更快,并且更容易处理NA。来自data.table的简单by参数可让您将结果按平方分组。

library(RcppRoll) 
weather[,rain_3:=if(.N>2){c(NA,NA,roll_sum(rain,n=3))}else{NA},by=square] 
weather 

    rain square desired_result rain_3 
1: NA  1    NA  NA 
2: NA  1    NA  NA 
3: NA  1    NA  NA 
4: 0  1    NA  NA 
5: 0  1    NA  NA 
6: 5  1    5  5 
7: 1  1    6  6 
8: 0  1    6  6 
9: 3  1    4  4 
10: 10  2    NA  NA 
0

一个dplyr解决方案:

library(dplyr) 
weather %>% 
    group_by(square) %>% 
    mutate(rain_3 = rain + lag(rain) + lag(rain, n = 2L)) 

结果:

Source: local data table [10 x 4] 

    rain square desired_result rain_3 
    (dbl) (dbl)   (dbl) (dbl) 
1  NA  1    NA NA 
2  NA  1    NA NA 
3  NA  1    NA NA 
4  0  1    NA NA 
5  0  1    NA NA 
6  5  1    5  5 
7  1  1    6  6 
8  0  1    6  6 
9  3  1    4  4 
10 10  2    NA NA 

如果你想分配rain3资料集,您可以使用从maggritr%<>%符号在你管:

library(magrittr) 
weather %<>% 
    group_by...... 
3

rollapply的解决办法就像这样:

weather[, rain_3 := rollapplyr(c(NA, NA, rain), 3, sum), by = square] 

,并提供:

rain square desired_result rain_3 
1: NA  1    NA  NA 
2: NA  1    NA  NA 
3: NA  1    NA  NA 
4: 0  1    NA  NA 
5: 0  1    NA  NA 
6: 5  1    5  5 
7: 1  1    6  6 
8: 0  1    6  6 
9: 3  1    4  4 
10: 10  2    NA  NA