2016-02-05 44 views
0

,我有以下数据:优化rollapplyr自定义函数

y <- data.table(cbind(week = rep(1:61,5352), 
ID = rep(1:5352, each = 61), w = runif(326472), v = runif(326472))) 
y$v[sample(1:326472, 10000, replace=FALSE)] <- NA 

为此我运行的代码波纹管创建变量v,忽略离群和NAS的滚动平均值。代码正在工作,但性能不佳。 我确定有更高效的方法来使用应用程序或类似的东西来运行它,但我在创建更快的版本方面没有成功。任何人都可以阐明如何使其更有效率?

IDs <- unique(y$ID) 
y$vol_m12 <- 0 

for (i in 1:length(IDs)) { 
    x <- y[ID==IDs[i]] 

    outlier <- 0.2 
    w_outlier <- quantile(x$w, c(outlier), na.rm = T) 
    v_outlier <-quantile(x$v, c(1 - outlier), na.rm = T) 

# Ignore outliers  
    x$v_temp <- x$v 
    x$v_temp[((x$v_temp >= v_outlier) 
       & (x$w <= w_outlier))] <- NA 

# Creating rolling mean 
    y$vol_m12[y$ID==IDs[i]] <- x[, rollapplyr(v_temp, 12, (mean), fill = NA, na.rm=T)] 
} 
+0

[这可能是有帮助的(https://stackoverflow.com/questions/34754786/how-to-create-a-matrix-by-averaging-the-elements-of-another-matrix-in -r/34755233#34755233) – user5249203

+1

请参阅:http://stackoverflow.com/questions/29851637/efficiently-perform-row-wise-distribution-test。看起来很可能该Rcpp代码的次要mod可能会成功。 –

回答

1

感谢您的回复。 经过42忠告,我已经产生了以下代码:

library(RcppRoll) 
# Ignore outliers 
y[, w_out := quantile(w, c(outlier), na.rm = T), by=ID] 
y[, v_out := quantile(v, c(1-outlier), na.rm = T), by=ID] 
y[((v <= v_out) & (w >= w_out)), v_temp := v] 
y[,w_out := NULL] 
y[,v_out := NULL] 

y[, v_m12 := roll_mean(as.matrix(v_temp), n =12L, fill = NA, 
        align = c("right"), normalize = TRUE, na.rm = T), by = ID] 

系统时间就是对10.36约0.59秒的解决方案波纹管,它采用rollapplyr(但可能有可能使异常值去除更有效)。

y[, v_m12 :=rollapplyr(v_temp, 12, (mean), fill = NA, na.rm=T), by = ID]