正则表达式搜索词两者是相同

中的文字：正则表达式搜索词两者是相同

PAGE 1 
apple 

PAGE 2 
apple 
banana 

PAGE 3 
orange 

PAGE 4 
banana 

PAGE 5 
pear 

PAGE 6 
apple 
orange 
banana 
pea

我希望有一个正则表达式，会告诉我每一个页面一个香蕉上，这是第2页和4

事情我已经尝试：

PAGE.*?banana.*?PAGE

但返回PAGE 1和4

PAGE(?!.*?PAGE).*?banana

这是一个尝试向前看，并确保页面和香蕉字之间没有额外的页面，但这没有返回。

(?<=PAGE).*(?=banana)

借来自Regex, get entire string between two keywords。这是返回PAGE 1，匹配最后一个香蕉和第一页之间的所有内容。

我认为环顾四周就是答案，但我无法围绕如何匹配PAGE＃和香蕉，但只有PAGE＃香蕉。我怎么做？

来源

2016-03-14 Mike

您需要编号或整个块？ – andlrc

你只需要一个温和的贪婪标记解决方案：['PAGE \ d + \ n（（？：（？！\ bbanana \ b | \ nPAGE \ d + \ n）。）* \ bbanana \ b（？：（？！\ （* = \ nPAGE \ d + \ n | $）']（https://regex101.com/r/gE1oN3/1）。 –

我只需要这个号码。 – Mike

试试这个正则表达式。

正则表达式：PAGE (\d+)\s[^ ]*(?=banana)[^ ]*\n

标志使用：

g全局搜索。
s允许.搜索换行符。

捕获使用\1或$1第一组。

Regex101 Demo

来源

2016-03-14 20:53:08

这工作，非常感谢你！ – Mike

我希望你的数据格式保持不变，否则只有**一个空间毁了这一切**。：D – 2016-03-14 20:58:07

这是一个很好的观点。 – Mike

尝试这种模式

(?<=PAGE)(\d+)(?=(?:[^P]|\bP(?!AGE\b))*\bbanana\b)

Demo

来源

2016-03-14 20:52:43

为re.finditer大用途：

txt="""\ 
PAGE 1 
apple 

PAGE 2 
apple 
banana 

PAGE 3 
orange 

PAGE 4 
banana 

PAGE 5 
pear""" 

import re 

tgt='banana' 

for m in re.finditer(r'^PAGE\s+(\d+)\s+([\s\S]+?)(?=^PAGE|\Z)', txt, re.M): 
    if re.search(r'(?i){}'.format(tgt), m.group(2)): 
     print '"{}" found on Page {}'.format(tgt, m.group(1))

打印：

"banana" found on Page 2 
"banana" found on Page 4

同样的技术可以产生每个水果的映射上页：

di={} 
for m in re.finditer(r'^PAGE\s+(\d+)\s+([\s\S]+?)(?=^PAGE|\Z)', txt, re.M): 
    for fruit in m.group(2).split(): 
     di.setdefault(fruit, []).append(m.group(1)) 
>>> di 
{'orange': ['3'], 'pear': ['5'], 'apple': ['1', '2'], 'banana': ['2', '4']}

来源

2016-03-14 21:05:33 dawg

-1

这工作：

PAGE）*（:(？PAGE？）。？香蕉

由于Wiktor的的有关使用回火贪婪令牌解决方案的评论，我用Google搜索，发现这个页面：http://www.rexegg.com/regex-quantifiers.html#tempered_greed

谢谢大家！

来源

2016-03-14 21:12:30 Mike

只给另一种选择，这会工作，以及：

^PAGE\s+(?P<page>\d+)[\n\r] # match PAGE + whitespace + digit at the beginning of a line 
(?s:      # open a non-capturing, single-line parenthesis 
    (?:.(?!^$))*?   # make not to match an empty line (lazily) 
    \bbanana\b    # look for banana with word boundaries 
    (?:.(?!^$))*? 
)

见a demo on regex101.com。

来源

2016-03-14 21:46:33 Jan

正则表达式搜索词两者是相同

回答

相关问题