计算R中移动的最近加权平均值

我想计算一匹马参加比赛的时间（日期）和完成位置（pos）时马的移动新近加权平均结束位置。这些统计数据在handicapping中很有用。计算R中移动的最近加权平均值

目前，我正在使用“loop-inside-a-loop”方法。对这个问题有没有更快或更优雅的R语言方法？

# 
# Test data 
# 

day <- c(0, 6, 10, 17, 21, 26, 29, 31, 34, 38, 41, 47, 48, 51, 61) 
pos <- c(3, 5, 6, 1, 1, 3, 4, 1, 2, 2, 2, 6, 4, 5, 6) 
testdata <- data.frame(id = 1, day = day, pos = pos, wt.pos = NA) 

# 
# No weight is given to observations earlier than cutoff 
# 

cutoff <- 30 

# 
# Rolling recency-weighted mean (wt.pos) 
# 

for(i in 2:nrow(testdata)) { 
    wt <- numeric(i-1) 
    for(j in 1:(i-1)) 
    wt[j] <- max(0, cutoff - day[i] + day[j] + 1) 
    if (sum(wt) > 0) 
     testdata$wt.pos[i] <- sum(pos[1:j] * wt)/sum(wt) 
} 

> testdata 

    id day pos wt.pos 
1 1 0 3  NA 
2 1 6 5 3.000000 
3 1 10 6 4.125000 
4 1 17 1 4.931034 
5 1 21 1 3.520548 
6 1 26 3 2.632911 
7 1 29 4 2.652174 
8 1 31 1 2.954128 
9 1 34 2 2.436975 
10 1 38 2 2.226891 
11 1 41 2 2.119048 
12 1 47 6 2.137615 
13 1 48 4 3.030534 
14 1 51 5 3.303704 
15 1 61 6 4.075000

来源

2012-09-05 gillenpj

我会去

# Calculate `wt` for all values of `i` in one go 
wt <- lapply(2:nrow(testdata), function(i) 
    pmax(0, cutoff - day[i] + day[1:(i-1)] + 1)) 

# Fill in the column 
testdata$wt.pos[-1] <- mapply(
    function(i, w) if(sum(w) > 0) sum(pos[1:i]*w)/sum(w) else NA, 
    1:(nrow(testdata)-1), wt)

注意，通过我们矢量计算，而这被许多订单提高了速度，同时计算的第二个参数为max的j所有值数量级。

我发现没有简单的方法来矢量化外环和if情况下，虽然（除了用C，这似乎有点小题大做重写），但lapply，mapply和类似仍然快于for循环。

来源

2012-09-05 16:14:50 Backlin

你混淆了'和'as.numeric' numeric'在你的笔记 – Julius

哦，你是对的，谢谢！ – Backlin

当应用于由'testdata'的5000个重复组成的数据帧时，该解决方案比使用循环内部循环方法的解决方案快大约20％。然而，使用mapply和保留外部'for'循环似乎几乎没有任何区别。 – gillenpj

该版本演示了如何计算一个或多个变量（例如，完成位置，速度等级等）和一个或多个主体（马）的移动新近加权平均值。

library(plyr) 

day <- c(0, 6, 10, 17, 21, 26, 29, 31, 34, 38, 41, 47, 48, 51, 61) 
pos <- c(3, 5, 6, 1, 1, 3, 4, 1, 2, 2, 2, 6, 4, 5, 6) 
dis <- 100 + 0.5 * (pos - 1) 
testdata1 <- data.frame(id = 1, day = day, pos = pos, dis = dis) 
day <- c(0, 4, 7, 14, 22, 23, 31, 38, 42, 47, 52, 59, 68, 69, 79) 
pos <- c(1, 3, 2, 6, 4, 5, 2, 1, 4, 5, 2, 1, 5, 5, 2) 
dis <- 100 + 0.5 * (pos - 1) 
testdata2 <- data.frame(id = 2, day = day, pos = pos, dis = dis) 
testdata <- rbind(testdata1, testdata2) 

# Moving recency-weighted mean 
rollmean <- function(day, obs, cutoff = 90) { 
    obs <- as.matrix(obs) 
    wt <- lapply(2:nrow(obs), function(i) 
    pmax(0, cutoff - day[i] + day[1:(i-1)] + 1)) 
    wt.obs <- lapply(1:(nrow(obs)-1), FUN = 
    function(i) 
     if(sum(wt[[i]]) > 0) { 
     apply(obs[1:i, , drop = F] * wt[[i]], 2, sum)/sum(wt[[i]]) 
     } else { 
     rep(NA, ncol(obs)) 
     } 
) 
    answer <- rbind(rep(NA, ncol(obs)), do.call(rbind, wt.obs)) 
    if (!is.null(dimnames(answer))) 
    dimnames(answer)[[2]] <- paste("wt", dimnames(answer)[[2]], sep = ".") 
    return(answer) 
} 

x <- dlply(testdata, .(id), .fun = 
    function(DF) rollmean(DF$day, DF[, c("pos", "dis"), drop = F]) 
) 
y <- do.call(rbind, x)

来源

2012-09-08 22:57:11 gillenpj

计算R中移动的最近加权平均值

回答

相关问题