Python的正则表达式的查询缺少重叠子

第一个字符= 'N'
第二字符=任何不是 'P'
第三字符= 'S' 或 'T'
第四字符=任何不是 'P'

我的查询看起来是这样的：

re.findall(r"\N[A-OQ-Z][ST][A-OQ-Z]", text)

这是工作，除了在两个子重叠的具体情况。这种情况下涉及以下5character子：

'...NNTSY...'

查询捕捉第一4个字符的子串（“NNTS”），而不是第二4-字符子（“NTSY”）。

这是我第一次尝试正则表达式，显然我错过了一些东西。

来源

2013-09-01 rwjones

从Python 3文档（强调）：

 
$ python3 -c 'import re; help(re.findall)' 
Help on function findall in module re: 

findall(pattern, string, flags=0) 
    Return a list of all non-overlapping matches in the string. 

    If one or more capturing groups are present in the pattern, return 
    a list of groups; this will be a list of tuples if the pattern 
    has more than one group. 

    Empty matches are included in the result.

如果你想重叠的情况下，在一个循环中使用regex.search()。您必须编译正则表达式，因为非编译正则表达式的API不会使用参数来指定起始位置。

def findall_overlapping(pattern, string, flags=0): 
    """Find all matches, even ones that overlap.""" 
    regex = re.compile(pattern, flags) 
    pos = 0 
    while True: 
     match = regex.search(string, pos) 
     if not match: 
      break 
     yield match 
     pos = match.start() + 1

来源

2013-09-01 01:58:33

你可以做到这一点，如果因为它匹配他们，这是可能的向前断言重新引擎不消耗字符：

import re 
text = '...NNTSY...' 
for m in re.findall(r'(?=(N[A-OQ-Z][ST][A-OQ-Z]))', text): 
    print(m)

输出：

有断言中的所有内容都有效，但也感觉很奇怪。另一种方式是采取N出来断言：

for m in re.findall(r'(N(?=([A-OQ-Z][ST][A-OQ-Z])))', text): 
    print(''.join(m))

来源

2013-09-01 02:12:13 perreal

(N[^P](?:S|T)[^P])

Regular expression visualization

Edit live on Debuggex

来源

2013-09-01 07:08:55 progrenhard

Python的正则表达式的查询缺少重叠子

回答

相关问题