我有一个数据集我想删除4个不同列中有重复信息的数据行。然后根据r中多个列的条件删除重复的行
foo<- data.frame(g1 = c("1","0","0","1","1"), v1 = c("7","5","4","4","3"), v2 = c("a","b","x","x","e"), y1 = c("y","c","f","f","w"), y2= c("y","y","y","f","c"), y3 = c("y","c","c","f","w"), y4= c("y","y","f","f","c"), y5=c("y","w","f","f","w"), y6=c("y","c","f","f","w"))
foo的样子:
g1 v1 v2 y1 y2 y3 y4 y5 y6
1 1 7 a y y y y y y
2 0 5 b c y c y w c
3 0 4 x f y c f f f
4 1 4 x f f f f f f
5 1 3 e w c w c w w
现在,我想删除已经重复的基础上,Y1-6columns数据的任何一行。所以,如果正确完成,只有第4行和第1行将被删除,基于所有Y变量完全相同。它是一个多列条件。
我相信我很接近,但它只是工作不正常。
我曾尝试过:new = foo[!(duplicated(foo[,1:6]))]
想用重复的命令,它会搜索,只发现那些完全匹配?
我想过使用条件语句&,但无法弄清楚如何做到这一点。
new = foo[foo$y1==foo$y2|foo$y3|foo$y4|foo$y5|foo$y6]
我想过哪些,但我现在不知所措,失去了。我希望富的样子:
g1 v1 v2 y1 y2 y3 y4 y5 y6
2 0 5 b c y c y w c
3 0 4 x f y c f f f
5 1 3 e w c w c w w
我最喜欢的为止。 –
我很抱歉我简化了我的例子,因为我有更多的信息栏......所以基于变量1-6只有9条信息总栏。我将编辑上面的例子 – Kerry
我更新了我的答案。现在,行的选择基于列y1 - y6。 –