2017-05-14 18 views
3

我有以下DF:的R - 获得最高值为每个ID

>> animals_df: 

animal_name age 
cat    1 
cat    1 
cat    2 
cat    3 
cat    3 
dog    1 
dog    1 
dog    3 
dog    4 
dog    4 
dog    4 
horse   1 
horse   3 
horse   5 
horse   5 
horse   5 

我想从每个品种的最高年龄仅提取动物。所以,我想下面的输出:

animal_name age 
    cat   3 
    cat   3 
    dog   4 
    dog   4 
    dog   4 
    horse  5 
    horse  5 
    horse  5 

我已经尝试使用:

animals_df = do.call(rbind,lapply(split(animals_df, animals_df$animal_name), function(x) tail(x, 1))) 

但这只会给每个动物的一个实例,它是这样的:

animals_name age 
    cat   3 
    dog   4 
    horse  5 
+3

'DAT [与(DAT,年龄== AVE(年龄,animal_name,FUN =最大值)),]'在基R. – thelatemail

回答

4

这很容易dplyr/tidyverse

library(tidyverse) 

# How I read your data in, ignore since you already have your data available 
df = read.table(file="clipboard", header=TRUE) 
df %>% 
    group_by(animal_name) %>% 
    filter(age == max(age)) 

# Output: 
Source: local data frame [8 x 2] 
Groups: animal_name [3] 

    animal_name age 
     <fctr> <int> 
1   cat  3 
2   cat  3 
3   dog  4 
4   dog  4 
5   dog  4 
6  horse  5 
7  horse  5 
8  horse  5 
1

另一个data.table选项是:

library(data.table) 
setDT(df) 
df[, .SD[which(age == max(age))], by = animal_name] 

#  animal_name age 
#1:   cat 3 
#2:   cat 3 
#3:   dog 4 
#4:   dog 4 
#5:   dog 4 
#6:  horse 5 
#7:  horse 5 
#8:  horse 5