2013-11-22 110 views

我试图做一个正则表达式,如果可能的话,这将匹配从文本的所有引用的字符串。 一个例子:正则表达式匹配的所有引用的字符串

ABC released its full midseason schedule today, and it features premiere dates for several new shows, along with one rather surprising timeslot change.</p><p>First of all, ABC's previously reported plans for dramas 'Once Upon A Time,' 'Revenge,' 'Grey's Anatomy,' and 'Scandal' haven't changed. 


's previously reported plans for dramas ' (not useful but i can manage it) 
'Once Upon A Time,' 
' ' 
' 'Grey' 
'Grey's Anatomy,' 




这不是明摆着的。你有什么尝试,你使用什么语言?要编写正则表达式,您需要定义如何匹配特定字符集的逻辑。从你提供的一组输出中,有一些字符串有2个单引号和3个。你认为那个正则表达式是人类的,可以检测到'Gray's Anatomy'应该是一个字符串而不是两个?这可能是一个小小的开始[''(?!s)。* ?,''](http://regex101.com/r/gX9cO8)。你可以用另一种方法查看问题,找到第二个'

',然后用','分割。 – HamZa



分割,因为这是一个例子,其他文本不会有p。 – aciobanu


更清晰,我想要一个正则表达式,即给定一个输入字符串像'文本1'文本2','文本3'会给我至少(我不介意任何额外的无用的匹配)文本1,文字2,文字3.谢谢。 – aciobanu



下面是Perl的的解决方案,对于给定的例子工程。请参阅live demo

#!/usr/bin/perl -w 

use strict; 
use warnings; 

while (<DATA>) { 

# \1/ Starting at the beginning of a string or non-word character, 
# \2/ MATCH a single-quote character followed by a character that is 
#  *not* a single quote character, 
# \3/ And continue matching one or more times: 
#  - a white space character, 
#  - a word character, 
#  - a comma, 
#  - or a single-quote that is followed by a lower-case 's' or 't'. 
# \4/ And END the match on a single quote. 
# \5/ Continue searching for additional matches. 

    my @matches = /(?:\A|\W)('[^'](?:\w|\s|,|'(?=[st]\b))+')/g; 

#     \___1___/\__2_/\___________3__________/4/\5/ 

    print join("\n", @matches), "\n"; 

'At the Beginning' ABC released its full midseason schedule today, and it features premiere dates for several new shows, along with one rather surprising timeslot change.</p><p>First of all, ABC's previously reported plans for dramas 'Once Upon A Time,' 'Revenge,' 'Grey's Anatomy,' and 'Scandal' haven't changed. 


'At the Beginning' 
'Once Upon A Time,' 
'Grey's Anatomy,' 

谢谢!你是正则表达式完美的作品。我会分析它,所以我可以从中学习。 – aciobanu


@aciobanu - 很高兴听到我的解决方案满足您的需求。我的解决方案并不完全适合您的问题,但我认为我明白了您的真正需求:查找在某些情况下可能包含引号的带引号的表达式。棘手的部分是区分外部报价和内部报价。你的问题让我有机会亲自学习一点。 :-) – DavidRR
