如何在r中查找一个向量中的字符串？

我创建了一个基本上创建1000个二进制值的向量的函数。我已经能够使用rle来计算连续1秒的最长连续数。如何在r中查找一个向量中的字符串？

我想知道如何在这个更大的向量中找到一个特定的向量（比如说c(1,0,0,1)）？我希望它能够返回该向量的出现次数。所以c(1,0,0,1,1,0,0,1)应该返回2，而c(1,0,0,0,1)应该返回0

，我发现大多数解决方案只觉得是发生在所有的序列，并返回TRUE或FALSE，或者他们给了个人价值的结果，而不是特定的向量被指定。

这里是我到目前为止的代码：

# creates a function where a 1000 people choose either up or down. 
updown <- function(){ 
    n = 1000 
    X = rep(0,n) 
    Y = rbinom(n, 1, 1/2) 
    X[Y == 1] = "up" 
    X[Y == 0] = "down" 

    #calculate the length of the longest streak of ups: 
    Y1 <- rle(Y) 
    streaks <- Y1$lengths[Y1$values == c(1)] 
    max(streaks, na.rm=TRUE) 
} 

# repeat this process n times to find the average outcome. 
longeststring <- replicate(1000, updown()) 
longeststring(p_vals)

来源

2016-10-24 TheCurlyManLives

由于Y只有0 S和1 S，我们可以paste它变成一个字符串，并使用正则表达式，具体gregexpr。简化了一下：

set.seed(47) # for reproducibility 

Y <- rbinom(1000, 1, 1/2) 

count_pattern <- function(pattern, x){ 
    sum(gregexpr(paste(pattern, collapse = ''), 
       paste(x, collapse = ''))[[1]] > 0) 
} 

count_pattern(c(1, 0, 0, 1), Y) 
## [1] 59

paste减少图案并Y下为字符串，例如这里的模式为"1001"，Y为1000个字符的字符串。 gregexpr在Y中搜索该模式的所有匹配项，并返回匹配的索引（以及更多信息，以便可以提取它们，如果需要的话）。因为gregexpr将返回-1不匹配，测试大于0的数字将让我们简单地总结TRUE值以获取macthes的数量;在这种情况下，59

其他样品的情况下提到：

count_pattern(c(1,0,0,1), c(1,0,0,1,1,0,0,1)) 
## [1] 2 

count_pattern(c(1,0,0,1), c(1,0,0,0,1)) 
## [1] 0

来源

2016-10-24 05:19:42 alistaire

这也将工作：

library(stringr) 
x <- c(1,0,0,1) 
y <- c(1,0,0,1,1,0,0,1) 
length(unlist(str_match_all(paste(y, collapse=''), '1001'))) 
[1] 2 
y <- c(1,0,0,0,1) 
length(unlist(str_match_all(paste(y, collapse=''), '1001'))) 
[1] 0

如果你想匹配重叠的图案，

y <- c(1,0,0,1,0,0,1) # overlapped 
length(unlist(gregexpr("(?=1001)",paste(y, collapse=''),perl=TRUE))) 
[1] 2

来源

2016-10-24 06:12:02

@冯天其实我们需要使用前瞻断言，更新代码，让我知道如果它不起作用。 –

我明白了。你是对的。 –

如何在r中查找一个向量中的字符串？

回答

相关问题