2014-02-06 39 views
0

回顾后在str_extract有R

[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role) 
[01/29/14 16:42:57, 10.100.120.120, unknown]: spatial_monitor: Alan left Conference Room (Zone Role contains Person role) 
[01/29/14 16:43:00, 10.100.120.120, unknown]: spatial_monitor: Kurt entered Conference Room (Computer desk contains Person role) 
[01/29/14 16:43:02, 10.100.120.120, unknown]: spatial_monitor: Kurt left Conference Room (Computer desk contains Person role) 
[01/29/14 16:43:03, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role) 
[01/29/14 16:43:08, 10.100.120.120, unknown]: spatial_monitor: Alan left Conference Room (Zone Role contains Person role) 
[01/29/14 16:46:07, 10.100.120.120, unknown]: spatial_monitor: Fred entered Conference Room (Zone Role contains Person role) 
[01/29/14 16:46:08, 10.100.120.120, unknown]: spatial_monitor: Fred left Conference Room (Zone Role contains Person role) 

我想使用R中str_extract(库stringr)提取的位置(“会议室”上面的例子中)的名称下面的文本文件。逻辑是拉动字词“进入”或“离开”后面的部分。为此,我有以下的正则表达式

(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+ 

这在记事本++工作正常,但是当我嵌入此R中,我得到以下错误

> tt <- "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)" 
> str_extract(tt, '(?<=entered\\s)[A-Z][a-z]+\\s[A-Z][a-z]+') 
Error in regexpr("(?<=entered\\s)[A-Z][a-z]+\\s[A-Z][a-z]+", "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)", : 
    invalid regular expression '(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+', reason 'Invalid regexp' 

其他答案告诉我,lookahead and lookbehind only work with Perl。所以问题是如何使用str_extract来启用Perl?或者有更好的方法来做到这一点?提前致谢。

+1

这个工作并没有采用前瞻/回顾后。如图所示,将要提取的部分括起来:'library(gsubfn); strapplyc(tt,'entered \\ s([A-Z] [a-z] + \\ s [A-Z] [a-z] +)',simplify = TRUE)' –

回答

1
library(stringr) 
tt <- "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)" 
str_extract(tt, perl('(?<=entered\\s)[A-Z][a-z]+\\s[A-Z][a-z]+')) 
# [1] "Conference Room" 
+0

太棒了!感谢帮助。谢谢。 –

+0

@BalaDeshpande我的荣幸。 Simafore博客曾帮助我发现RapidMiner。 :) – lukeA

1

你的正则表达式有效。如果您指定perl = TRUE,它适用于sub。您也可以使用sub功能,为您的任务:

sub('.*(?<=entered\\s)([A-Z][a-z]+\\s[A-Z][a-z]+).*', '\\1', tt, perl = TRUE) 
# [1] "Conference Room" 

另外,不perl

sub('.*entered\\s([A-Z][a-z]+\\s[A-Z][a-z]+).*', '\\1', tt) 
# [1] "Conference Room" 
+0

感谢您提供替代方案。我有upvoted。不过,我喜欢其他答案。 –