2017-03-19 136 views
1

从下面的行创建新列我拥有许多运动员在比赛期间的位置数据。每场比赛最多持续30分钟。我的数据的一个例子是:基于变量

> df 
     StartValue Athlete Quarter Position 
    1  0.00 Paul  Q1 Bench 
    2  5.35 Paul  Q1 Defender 
    3  19.26 Paul  Q1 Bench 
    4  23.32 Paul  Q1 Defender 
    5  0.00 Paul  Q2 Bench 
    6  9.08 Paul  Q2 Defender 
    7  13.11 Paul  Q2 Defender 
    8  0.00 Paul  Q3 Defender 
    9  7.36 Paul  Q3 Defender 
    10  2.51 Paul  Q3 Bench 
    11  6.44 Paul  Q4 Bench 
    12  22.47 Paul  Q4 Bench 
    13  0.00 Paul  Q4 Defender 
    14  24.38 Paul  Q4 Defender 
    15  11.36 Paul  Q4 Defender 

我现在想创建一个新列df$EndValue这需要以下行的StartValue,并将其放置在同一列。当一个季度的最后一次入场发生时,必须将30放入df$EndValue。例如,前几排是:

 > df 
      StartValue Athlete Quarter Position EndValue 
     1  0.00 Paul  Q1 Bench 5.35 
     2  5.35 Paul  Q1 Defender 19.26 
     3  19.26 Paul  Q1 Bench 23.32 
     4  23.32 Paul  Q1 Defender 30.00 
     5  0.00 Paul  Q2 Bench 9.08 

我对data.frame预期的输出将是:

Output <- data.frame(StartValue=c(0, 5.35, 19.26, 23.32, 
           0.00, 9.08, 13.11, 0, 
           2.51, 7.36, 0.0, 6.44, 
           11.36, 22.47, 24.38), 
        EndValue=c(5.35, 19.26, 23.32, 30, 
           9.08, 13.11, 30, 2.51, 
           7.36, 30, 6.44, 11.36, 
           22.47, 24.38, 30), 
        Athlete = c('Paul', 'Paul', 'Paul', 'Paul', 
           'Paul', 'Paul', 'Paul','Paul', 
           'Paul', 'Paul', 'Paul','Paul', 
           'Paul', 'Paul', 'Paul'), 
        Quarter = c('Q1', 'Q1', 'Q1', 'Q1', 
           'Q2', 'Q2', 'Q2', 'Q3', 
           'Q3', 'Q3', 'Q4', 'Q4', 
           'Q4', 'Q4', 'Q4'), 
        Position = c('Bench','Defender','Bench','Defender', 
           'Bench','Defender','Defender','Defender', 
           'Defender','Bench','Bench','Bench', 
           'Defender', 'Defender', 'Defender')) 

我有这30节分钟一节的许多运动员的数据,所以怎么能我很快添加这个新专栏?

谢谢。

回答

2

setDT将数据帧转换为数据表。按Quarter分组,并将最后一个值指定为30,并设置EndValue列。

library('data.table') 

编辑:

您的评论,你问改变endValue值具有唯一值每季度进行。首先将StartValue指定为EndValue,然后查找每个季度中最后一个值的行索引。在下一步中,使用31 for Q1, 32 for Q2, 33 for Q3 and 34 for Q4.更新EndValue

我创建了两个玩家 - 保罗和鲍勃。他们都有相同的数据,除了他们的名字。

# sample data 
setDT(df) # convert data frame to data table by reference 
df1 <- copy(df) # replicate data by copying df 
df[, Athlete := 'Bob'] # asssign Athlete with Bob player 
df <- rbindlist(l = list(df1, df)) # combine df1 and df 

# sort StartValue by player and quarter 
df <- df[order(StartValue), .SD, by = .(Athlete, Quarter) ] 

# assign start to endvalue and with unique number per player per quarter 
df[, EndValue := StartValue ] # Assign StartValue to EndValue 

# remove 1st, shift values up and assign NA to last 
df[, EndValue := c(EndValue[-1], NA), by = .(Athlete, Quarter)] 

df[ i = df[, .I[.N], by = .(Quarter, Athlete)][, V1], 
    j = EndValue := rep(c(31,32,33,34), 
         length(df[, unique(Athlete) ])) ] 

df 
# Athlete Quarter StartValue Position EndValue 
# 1: Paul  Q1  0.00 Bench  5.35 
# 2: Paul  Q1  5.35 Defender 19.26 
# 3: Paul  Q1  19.26 Bench 23.32 
# 4: Paul  Q1  23.32 Defender 31.00 
# 5: Paul  Q2  0.00 Bench  9.08 
# 6: Paul  Q2  9.08 Defender 13.11 
# 7: Paul  Q2  13.11 Defender 32.00 
# 8: Paul  Q3  0.00 Defender  2.51 
# 9: Paul  Q3  2.51 Bench  7.36 
# 10: Paul  Q3  7.36 Defender 33.00 
# 11: Paul  Q4  0.00 Defender  6.44 
# 12: Paul  Q4  6.44 Bench 11.36 
# 13: Paul  Q4  11.36 Defender 22.47 
# 14: Paul  Q4  22.47 Bench 24.38 
# 15: Paul  Q4  24.38 Defender 34.00 
# 16:  Bob  Q1  0.00 Bench  5.35 
# 17:  Bob  Q1  5.35 Defender 19.26 
# 18:  Bob  Q1  19.26 Bench 23.32 
# 19:  Bob  Q1  23.32 Defender 31.00 
# 20:  Bob  Q2  0.00 Bench  9.08 
# 21:  Bob  Q2  9.08 Defender 13.11 
# 22:  Bob  Q2  13.11 Defender 32.00 
# 23:  Bob  Q3  0.00 Defender  2.51 
# 24:  Bob  Q3  2.51 Bench  7.36 
# 25:  Bob  Q3  7.36 Defender 33.00 
# 26:  Bob  Q4  0.00 Defender  6.44 
# 27:  Bob  Q4  6.44 Bench 11.36 
# 28:  Bob  Q4  11.36 Defender 22.47 
# 29:  Bob  Q4  22.47 Bench 24.38 
# 30:  Bob  Q4  24.38 Defender 34.00 
#  Athlete Quarter StartValue Position EndValue 
+0

如果宿舍的长度各不相同,例如Q1 = 30分钟和Q2 = 31分钟,我该如何添加此项?谢谢! – user2716568

+0

当我在更广泛的数据集上运行这个函数时,df < - setDT(df)[,EndValue:= c(StartValue [1 :(。N-1)],30),by =。(Athlete,Quarter)]返回以下错误:'14:在[[.data.table'(setDT(df),,':='(EndValue,c(StartValue [1 :(.N - ...): RHS 1是长度2 (大于组92的大小(1)),最后1个元素将被丢弃。' – user2716568

+0

是的,我有多达45名运动员,但他们的宿舍数量相同,他们的数字不同但是每个季度的排数都是 – user2716568

1

下面是使用dplyr一个解决方案:

library(dplyr) 
quarter_lengths <- c(Q1 = 31, Q2 = 32, Q3 = 30, Q4 = 33) 
df %>% 
    group_by(Athlete, Quarter) %>% 
    mutate(EndValue = c(StartValue[-1], quarter_lengths[Quarter[1]])) 

如果它变得更复杂,例如多个不同长度的游戏,我会创建一个新的data.frame四分之一长度和inner_join他们。

+0

爱一个'dplyr'解决方案!这工作得很好。 – user2716568