2015-04-17 70 views
0

我有以下模式和字符串reg-ex匹配问题。模式基本上是一个名称,后面跟着任意数量的字符,后跟一个短语(见下面的模式),后面跟着任意数量的字符,后跟机构名称。python reg-ex模式不匹配

pattern = "[David Maxwell|David|Maxwell] .* [educated at|graduated from|attended|studied at|graduate of] .* Eton College" 
str = "David Maxwell was educated at Eton College, where he was a King's Scholar and Captain of Boats, and at Cambridge University where he rowed in the winning Cambridge boat in the 1971 and 1972 Boat Races." 
match = re.search(pattern, str) 

但是搜索方法返回一个不匹配上面的str?我的reg-ex是否正确?我是reg-ex的新手。任何帮助表示赞赏

+0

你想用(... | ...),而不是[... | ...] – sshashank124

+0

此外,在蟒蛇,它是最好的定义正则表达式模式时要使用原始字符串。 – bgm387

+0

我改变了它。但似乎还有另一个问题。如果我将“受过教育”改为“受过教育”,它就会匹配。任何想法,为什么? – raghu

回答

5

[...]意思是“从这组字符中的任何字符”。如果你想要“这组词中的任何词”,你需要使用括号:(...|...)

你的表情还有另一个问题,你有.*(空格,点,星号,空格),这意味着“一个空格,后跟零个或多个字符,后跟一个空格”。换句话说,最短的匹配是两个的空格。然而,你的文本在“受过教育”和“伊顿公学”之间只有一个空间。

>>> pattern = '(David Maxwell|David|Maxwell).*(educated at|graduated from|attended|studied at|graduate of).*Eton College' 
>>> str = "David Maxwell was educated at Eton College, where he was a King's Scholar and Captain of Boats, and at Cambridge University where he rowed in the winning Cambridge boat in the 1971 and 1972 Boat Races." 
>>> re.search(pattern, str) 
<_sre.SRE_Match object at 0x1006d10b8> 
+0

我改变了它。但似乎还有另一个问题。如果我将“受过教育”改为“受过教育”,它就会匹配。任何想法,为什么? – raghu

+0

谢谢@Bryan。它现在工作:) – raghu