2009-12-21 31 views
2

我正在基于我是否预编译正则表达式不同的结果:Python的正则表达式不一致

>>> re.compile('mr', re.IGNORECASE).sub('', 'Mr Bean') 
' Bean' 
>>> re.sub('mr', '', 'Mr Bean', re.IGNORECASE) 
'Mr Bean' 

Python documentation说:一些功能被简化为编译正则表达式的全功能版本的方法。但它也声称RegexObject.sub()是与sub()函数相同。

那么这里发生了什么?

回答

12

re.sub()不能接受re.IGNORECASE,看来。

的文档状态:

sub(pattern, repl, string, count=0)

Return the string obtained by replacing the leftmost 
non-overlapping occurrences of the pattern in string by the 
replacement repl. repl can be either a string or a callable; 
if a string, backslash escapes in it are processed. If it is 
a callable, it's passed the match object and must return 
a replacement string to be used.

使用该作品在其位,但是:

re.sub("(?i)mr", "", "Mr Bean") 
5

模块级别sub()调用在最后不接受修饰符。那就是“count”参数 - 要被替换的模式发生的最大数目。

4
>>> help(re.sub) 
    1 Help on function sub in module re: 
    2 
    3 sub(pattern, repl, string, count=0) 
    4  Return the string obtained by replacing the leftmost 
    5  non-overlapping occurrences of the pattern in string by the 
    6  replacement repl. repl can be either a string or a callable; 
    7  if a callable, it's passed the match object and must return 
    8  a replacement string to be used. 

没有功能正则表达式标记(IGNORECASE, MULTILINE, DOTALL)中的参数re.sub,如re.compile

替代方案:

>>> re.sub("[M|m]r", "", "Mr Bean") 
' Bean' 

>>> re.sub("(?i)mr", "", "Mr Bean") 
' Bean' 

编辑 Python 3.1中,增加了对正则表达式的标志,http://docs.python.org/3.1/whatsnew/3.1.html。从3.1开始, re.sub样子:

re.sub(pattern, repl, string[, count, flags]) 
2

从Python 2.6.4文档:

re.sub(pattern, repl, string[, count]) 

应用re.sub()不带标志设置正则表达式模式。如果你想re.IGNORECASE,你必须使用re.compile()。sub()