2016-02-08 51 views
0

我有一个data.frame,如下所示。在R data.frame中过滤重复的行

> df2 <- data.frame("StudentId" = c(1,1,1,2,2,3,3), "Subject" = c("Maths", "Maths", "English","Maths", "English", "Science", "Science"), "Score" = c(100,90,80,70, 60,20,10)) 
> df2 
    StudentId Subject Score 
1   1 Maths 100 
2   1 Maths 90 
3   1 English 80 
4   2 Maths 70 
5   2 English 60 
6   3 Science 20 
7   3 Science 10 

很少StudentIds,有重复列标题(值例如:ID 1具有2个条目“数学”我需要仅保持重复的行中的第一个 预期data.frame是。:

StudentId Subject Score 
1   1 Maths 100 
3   1 English 80 
4   2 Maths 70 
5   2 English 60 
6   3 Science 20 

我不能做到这一点。 任何想法。

+1

另外[this](http://stackoverflow.com/questions/13967063/remove-duplicate-rows-in-r)和[this](http://stackoverflow.com/questions/13279582/select-only-the-first-rows-for-each-unique-value-of-column-in-r) –

回答

3

我们既可以使用uniquedata.tableby选择转换后“data.table” (setDT(df2)

library(data.table) 
unique(setDT(df2), by = c("StudentId", "Subject")) 
# StudentId Subject Score 
#1:   1 Maths 100 
#2:   1 English 80 
#3:   2 Maths 70 
#4:   2 English 60 
#5:   3 Science 20 

或者从 'DF2'

library(dplyr) 
distinct(df2, StudentId, Subject) 
#  StudentId Subject Score 
#  (dbl) (fctr) (dbl) 
#1   1 Maths 100 
#2   1 English 80 
#3   2 Maths 70 
#4   2 English 60 
#5   3 Science 20 

base R

df2[!duplicated(df2[1:2]),] 

EDIT

或者duplicateddistinct:基于suggestio ns by @David Arenburg)

+2

我认为只是'独特的(setDT(df2),by = c(“StudentId” ,“主题”))'?或'distinct(df2,StudentId,Subject)'? –