2015-08-24 46 views
2

R newb。我的数据的小代表。动态计数的发生

TeamHome <- c("LAL", "HOU", "SAS", "LAL") 
TeamAway <- c("IND", "SAS", "LAL", "HOU") 
df <- data.frame(cbind(TeamHome, TeamAway)) 
df 

    TeamHome TeamAway 
    LAL  IND 
    HOU  SAS 
    SAS  LAL 
    LAL  HOU 

想象这些成千上万的游戏一个赛季的前四场比赛。对于主队和客队,我想要计算在家中,在路上和总数上的累计比赛数量。因此,主队和客队都有3个新栏目。我想获得这样的事情(在这种情况下,我只计算主队新的变量):

TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames 
1  LAL  IND       1      0     1 
2  HOU  SAS       1      0     1 
3  SAS  LAL       1      1     2 
4  LAL  HOU       2      1     3 

要计算第一列(HomeTeamGamesPlayedatHome)我成功做到这一点的:

df$HomeTeamGamesPlayedatHome <- ave(df$TeamHome==df$TeamHome, df$TeamHome, FUN=cumsum) 

但感觉过于复杂,我也无法用这种方法计算其他列。

我也想过用公式表计算出现的数量:

table(df$TeamHome) 

,但它只是计算总数,我想在任何给定时间点的结果。 谢谢!

+0

好问题,upvote for reproducable example and desired output – user2673238

回答

2
library(dplyr) 
df <- df %>% group_by(TeamHome) %>% 
    mutate(HomeGames = seq_along(TeamHome)) 
lst <- list() 
for(i in 1:nrow(df)) lst[[i]] <- sum(df$TeamAway[1:i] == df$TeamHome[i]) 
df$HomeTeamGamesPlayedRoad <- unlist(lst) 
df %>% mutate(HomeTeamTotalgames = HomeGames+HomeTeamGamesPlayedRoad) 
    TeamHome TeamAway HomeGames HomeTeamGamesPlayedRoad HomeGames 
1  LAL  IND   1      0   1 
2  HOU  SAS   1      0   1 
3  SAS  LAL   1      1   2 
4  LAL  HOU   2      1   3 

HomeGamesseq_along由行迭代创建。 HomeTeamGamesPlayedRoad创建一个循环检查TeamAway中的值,直到并包括当前游戏。最后一行是另外两个创建的总和。

+0

它的工作表示感谢!我期待着一些不那么复杂的事情,但是它的工作。 – Sburg13

+0

hi pierre。非常感谢帮忙。想象一下,我有一个额外的第三列,主队得分为PTS,第四个为客队得分后的PTS。我怎样才能扩展这个公式来总结主队在家中和在路上得分的积分?非常感谢 – Sburg13

+1

最好再问一个跟进问题。并添加此问题作为参考链接。 –

1

甲环解决方案:

TeamHome <- c("LAL", "HOU", "SAS", "LAL") 
TeamAway <- c("IND", "SAS", "LAL", "HOU") 
df <- data.frame(TeamHome,TeamAway,HomeTeamGamesPlayedatHome=ave(TeamHome==TeamHome, TeamHome, FUN=cumsum)) 

for (i in 1:nrow(df)) { 
     curdf<-df[1:i,];v<-ave(curdf$TeamAway==as.character(curdf$TeamHome[i]), curdf$TeamAway, FUN=cumsum) 
     df$HomeTeamGamesPlayedRoad[i] <- sum(v) 
} 
df$HomeTeamTotalgames <- df$HomeTeamGamesPlayedatHome + df$HomeTeamGamesPlayedRoad 

     TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames 
1  LAL  IND       1      0     1 
2  HOU  SAS       1      0     1 
3  SAS  LAL       1      1     2 
4  LAL  HOU       2      1     3