0
我想问一个后续问题this issue,请问,因为还有一个问题出现:我发现属于多个类别(文化&人文与社会科学)的科目(文化研究),即有必须考虑的重叠。正确地消除R中重叠字符串的重复项?
我有类别的长列表,例如这款机器可读例如:
AB <- c("Science","Arts & Humanities","Arts & Humanities; Social Sciences","Science","Arts & Humanities; Arts & Humanities; Social Sciences","Science","Science; Social Sciences","Social Sciences; Science")
所以它看起来像这样:
> AB
[1] "Science" "Arts & Humanities"
[3] "Arts & Humanities; Social Sciences" "Science"
[5] "Arts & Humanities; Arts & Humanities; Social Sciences" "Science"
[7] "Science; Social Sciences" "Social Sciences; Science"
我想以修改这些条款和消除重复到得到这个结果:
[1] "Science" "Arts & Humanities"
[3] "Arts & Humanities; Social Sciences" "Science"
[5] "Arts & Humanities; Social Sciences" "Science"
[7] "Science; Social Sciences" "Science; Social Sciences"
所以我正在寻找另一个循环来消除在#5中重复。我试着用strsplit()和唯一的()但这并没有工作:
> unique(strsplit(AB, "; *"))
[[1]]
[1] "Science"
[[2]]
[1] "Arts & Humanities"
[[3]]
[1] "Arts & Humanities" "Social Sciences"
[[4]]
[1] "Arts & Humanities" "Arts & Humanities" "Social Sciences"
[[5]]
[1] "Social Sciences" "Science"
所以我想再问你一遍,请:我怎样才能实现上述正确的输出? 非常感谢您提前考虑!
有也''修剪在GDATA包()。 –
非常感谢您的回复,@Tyler Rinker!不幸的是,这给了我唯一的错误(修剪(x)):找不到功能“修剪”**我必须先安装gdata软件包吗? – user1496104
对不起。我没有把它定义为一个函数。现在就试试。 –