这简单的一行在base r
利用strsplit,然后grepl,而且是相当强劲,但将打破,如果有计数的比赛就像jjjjjj
为3手的jj
。使这成为可能的模式匹配是从@JoshOBriens excellent Q&A:
sum(grepl("jj" , unlist(strsplit(x , "(?<=.)(?=jj)" , perl = TRUE))))
# Examples....
f<- function(x){
sum(grepl("jj" , unlist(strsplit(x , "(?<=.)(?=jj)" , perl = TRUE))))
}
#3 matches here
xOP <- c("ajjss","acdjfkj","auyjyjjksjj")
f(xOP)
# [1] 3
#4 here
x1 <- c("ajjss","acdjfkj", "jj" , "auyjyjjksjj")
f(x1)
# [1] 4
#8 here
x2 <- c("jjbjj" , "ajjss","acdjfkj", "jj" , "auyjyjjksjj" , "jjbjj")
f(x2)
# [1] 8
#Doesn't work yet with multiple jjjj matches. We want this to also be 8
x3 <- c("jjjj" , "ajjss","acdjfkj", "jj" , "auyjyjjksjj" , "jjbjj")
f(x3)
# [1] 7
这是非常好的谢谢。我注意到你计算字符串的长度 - 在data.frame中我可以调用频率/长度吗?这将是非常有用的。谢谢。 – brucezepplin 2013-03-24 16:22:23
对不起 - 我的意思是,我可以为每个字符串返回子字符串的频率除以字符串的长度吗? – brucezepplin 2013-03-24 16:26:05
抱歉 - 我; m在nchar(df $ x)中获取错误:'nchar()'需要一个字符向量 – brucezepplin 2013-03-24 16:32:12