2012-09-05 252 views
2

我想计算一匹马参加比赛的时间(日期)和完成位置(pos)时马的移动新近加权平均结束位置。这些统计数据在handicapping中很有用。计算R中移动的最近加权平均值

目前,我正在使用“loop-inside-a-loop”方法。对这个问题有没有更快或更优雅的R语言方法?

# 
# Test data 
# 

day <- c(0, 6, 10, 17, 21, 26, 29, 31, 34, 38, 41, 47, 48, 51, 61) 
pos <- c(3, 5, 6, 1, 1, 3, 4, 1, 2, 2, 2, 6, 4, 5, 6) 
testdata <- data.frame(id = 1, day = day, pos = pos, wt.pos = NA) 

# 
# No weight is given to observations earlier than cutoff 
# 

cutoff <- 30 

# 
# Rolling recency-weighted mean (wt.pos) 
# 

for(i in 2:nrow(testdata)) { 
    wt <- numeric(i-1) 
    for(j in 1:(i-1)) 
    wt[j] <- max(0, cutoff - day[i] + day[j] + 1) 
    if (sum(wt) > 0) 
     testdata$wt.pos[i] <- sum(pos[1:j] * wt)/sum(wt) 
} 

> testdata 

    id day pos wt.pos 
1 1 0 3  NA 
2 1 6 5 3.000000 
3 1 10 6 4.125000 
4 1 17 1 4.931034 
5 1 21 1 3.520548 
6 1 26 3 2.632911 
7 1 29 4 2.652174 
8 1 31 1 2.954128 
9 1 34 2 2.436975 
10 1 38 2 2.226891 
11 1 41 2 2.119048 
12 1 47 6 2.137615 
13 1 48 4 3.030534 
14 1 51 5 3.303704 
15 1 61 6 4.075000 

回答

0

我会去

# Calculate `wt` for all values of `i` in one go 
wt <- lapply(2:nrow(testdata), function(i) 
    pmax(0, cutoff - day[i] + day[1:(i-1)] + 1)) 

# Fill in the column 
testdata$wt.pos[-1] <- mapply(
    function(i, w) if(sum(w) > 0) sum(pos[1:i]*w)/sum(w) else NA, 
    1:(nrow(testdata)-1), wt) 

注意,通过我们矢量计算,而这被许多订单提高了速度,同时计算的第二个参数为maxj所有值数量级。

我发现没有简单的方法来矢量化外环和if情况下,虽然(除了用C,这似乎有点小题大做重写),但lapplymapply和类似仍然快于for循环。

+1

你混淆了'和'as.numeric' numeric'在你的笔记 – Julius

+0

哦,你是对的,谢谢! – Backlin

+0

当应用于由'testdata'的5000个重复组成的数据帧时,该解决方案比使用循环内部循环方法的解决方案快大约20%。然而,使用mapply和保留外部'for'循环似乎几乎没有任何区别。 – gillenpj

0

该版本演示了如何计算一个或多个变量(例如,完成位置,速度等级等)和一个或多个主体(马)的移动新近加权平均值。

library(plyr) 

day <- c(0, 6, 10, 17, 21, 26, 29, 31, 34, 38, 41, 47, 48, 51, 61) 
pos <- c(3, 5, 6, 1, 1, 3, 4, 1, 2, 2, 2, 6, 4, 5, 6) 
dis <- 100 + 0.5 * (pos - 1) 
testdata1 <- data.frame(id = 1, day = day, pos = pos, dis = dis) 
day <- c(0, 4, 7, 14, 22, 23, 31, 38, 42, 47, 52, 59, 68, 69, 79) 
pos <- c(1, 3, 2, 6, 4, 5, 2, 1, 4, 5, 2, 1, 5, 5, 2) 
dis <- 100 + 0.5 * (pos - 1) 
testdata2 <- data.frame(id = 2, day = day, pos = pos, dis = dis) 
testdata <- rbind(testdata1, testdata2) 

# Moving recency-weighted mean 
rollmean <- function(day, obs, cutoff = 90) { 
    obs <- as.matrix(obs) 
    wt <- lapply(2:nrow(obs), function(i) 
    pmax(0, cutoff - day[i] + day[1:(i-1)] + 1)) 
    wt.obs <- lapply(1:(nrow(obs)-1), FUN = 
    function(i) 
     if(sum(wt[[i]]) > 0) { 
     apply(obs[1:i, , drop = F] * wt[[i]], 2, sum)/sum(wt[[i]]) 
     } else { 
     rep(NA, ncol(obs)) 
     } 
) 
    answer <- rbind(rep(NA, ncol(obs)), do.call(rbind, wt.obs)) 
    if (!is.null(dimnames(answer))) 
    dimnames(answer)[[2]] <- paste("wt", dimnames(answer)[[2]], sep = ".") 
    return(answer) 
} 

x <- dlply(testdata, .(id), .fun = 
    function(DF) rollmean(DF$day, DF[, c("pos", "dis"), drop = F]) 
) 
y <- do.call(rbind, x)