我想名字的矢量份额:子集矢量列(根据情况)
names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane")
成两个向量
last <- c("DOE", "VAN DYKE", "SMITH")
和
first <- c("John", "Dick", "Mary Jane")
任何帮助将不胜感激。谢谢。
我想名字的矢量份额:子集矢量列(根据情况)
names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane")
成两个向量
last <- c("DOE", "VAN DYKE", "SMITH")
和
first <- c("John", "Dick", "Mary Jane")
任何帮助将不胜感激。谢谢。
这应该工作:
# Define a pattern that only matches words composed entirely of capital letters
pat <- paste("^[", paste(LETTERS, collapse=""), "]*$", sep="")
# [1] "^[ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"
names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane")
splitNames <- strsplit(names, " ")
# LAST NAMES: (Extract and paste together words matching 'pat')
sapply(splitNames,
function(X) paste(grep(pat, X, value=TRUE), collapse=" "))
# [1] "DOE" "VAN DYKE" "SMITH"
# First Names: (Extract and paste together words NOT matching 'pat')
sapply(splitNames,
function(X) paste(grep(pat, X, value=TRUE, invert=TRUE), collapse=" "))
# [1] "John" "Dick" "Mary Jane"
匹配所有大写字母,你可以选择使用字符类[:upper:]
,如:
pat <- "^[[:upper:]]*$"
虽然在?regexp
文档似乎轻度警告反对这样做,理由是便携性降低。
这里的一种方式:
l <- strsplit(names," ")
splitCaps <- function(x){
ind <- x == toupper(x)
list(upper = paste(x[ind],collapse = " "),
lower = paste(x[!ind],collapse = " "))
}
> lapply(l,splitCaps)
[[1]]
[[1]]$upper
[1] "DOE"
[[1]]$lower
[1] "John"
[[2]]
[[2]]$upper
[1] "VAN DYKE"
[[2]]$lower
[1] "Dick"
[[3]]
[[3]]$upper
[1] "SMITH"
[[3]]$lower
[1] "Mary Jane"
做笔记,不过,这有大规模需要提醒的是,如果你开始不寻常的字符集混合挑选使用toupper
的全部大写的话将是非常不可靠的,区域设置,符号等,但对于非常简单的ASCII类型的情况,它应该可以正常工作。
您可以分享迄今为止已经尝试过的,以及为什么它没有按照您的意愿工作吗? – joran 2012-01-03 19:31:51
我试过strplit(名称,“”)沿空间分开。问题是姓氏和名字中的长度单词不是恒定的。一个常数是,姓氏总是全部大写。 – srmulcahy 2012-01-03 19:36:02