在R data.frame中过滤重复的行

2016-02-08 51 views 0 likes

我有一个data.frame，如下所示。在R data.frame中过滤重复的行

> df2 <- data.frame("StudentId" = c(1,1,1,2,2,3,3), "Subject" = c("Maths", "Maths", "English","Maths", "English", "Science", "Science"), "Score" = c(100,90,80,70, 60,20,10)) 
> df2 
    StudentId Subject Score 
1   1 Maths 100 
2   1 Maths 90 
3   1 English 80 
4   2 Maths 70 
5   2 English 60 
6   3 Science 20 
7   3 Science 10

很少StudentIds，有重复列标题（值例如：ID 1具有2个条目“数学”我需要仅保持重复的行中的第一个预期data.frame是。：

StudentId Subject Score 
1   1 Maths 100 
3   1 English 80 
4   2 Maths 70 
5   2 English 60 
6   3 Science 20

我不能做到这一点。任何想法。

来源

2016-02-08 sachinv

另外[this]（http://stackoverflow.com/questions/13967063/remove-duplicate-rows-in-r）和[this]（http://stackoverflow.com/questions/13279582/select-only-the-first-rows-for-each-unique-value-of-column-in-r） –

回答

我们既可以使用unique从data.table与by选择转换后“data.table” （setDT(df2)）

library(data.table) 
unique(setDT(df2), by = c("StudentId", "Subject")) 
# StudentId Subject Score 
#1:   1 Maths 100 
#2:   1 English 80 
#3:   2 Maths 70 
#4:   2 English 60 
#5:   3 Science 20

或者从 'DF2'

library(dplyr) 
distinct(df2, StudentId, Subject) 
#  StudentId Subject Score 
#  (dbl) (fctr) (dbl) 
#1   1 Maths 100 
#2   1 English 80 
#3   2 Maths 70 
#4   2 English 60 
#5   3 Science 20

从 base R

df2[!duplicated(df2[1:2]),]

EDIT

或者duplicateddistinct：基于suggestio ns by @David Arenburg）

来源

2016-02-08 17:04:05 akrun

我认为只是'独特的（setDT（df2），by = c（“StudentId” ，“主题”））'？或'distinct（df2，StudentId，Subject）'？ –

在R data.frame中过滤重复的行

回答

相关问题