蟒蛇正则表达式麻烦

我有以下代码：蟒蛇正则表达式麻烦

what = re.match("get|post|put|head\s+(\S+) ",data,re.IGNORECASE)

，并在数据变量让我们说我有这样一行：

GET some-site.com HTTP/1.0 ...

如果我停止调试器中的脚本，检查什么变量，我可以看到它只匹配GET。为什么它不符合some-site.com？

来源

2009-03-03 Geo

 

>>> re.match("(get|post|put|head)\s+(\S+) ",'GET some-site.com HTTP/1.0 ...',re.IGNORECASE).groups() 
('GET', 'some-site.com') 
>>>

来源

2009-03-03 12:38:52

它的工作原理，但你能解释为什么我的版本不起作用？我只想捕捉第二个单词。我知道我可以通过调用.group（1）来访问它，但我为什么我的版本不起作用感到困惑。 – Geo 2009-03-03 12:42:46

“为什么1 + 2 + 3 + 4 * 100是406而不是1000”？ http://www.amk.ca/python/howto/regex/regex.html#SECTION000510000000000000000。阅读“|”性格及其优先性。 – tzot 2009-03-03 22:55:49

正则表达式语言运算符优先级将head\s+(\S+)作为第4种选择。括号中的@Mykola Kharechko的答案安排为head作为第4个替代方案，并且\s+(\S+)被追加到与组匹配的任何其他方案。

来源

2009-03-03 12:52:48 gimel

+1 Mykola的回答和gimel的解释。另外，你真的想用这个正则表达式吗？正如你发现的那样，它们并不像看起来那么简单。这是一个非正则表达式的方法：

def splitandpad(s, find, limit): 
    seq= s.split(find, limit) 
    return seq+['']*(limit-len(seq)+1) 

method, path, protocol= splitandpad(data, ' ', 2) 
if method.lower() not in ('get', 'head', 'post', 'put'): 
    # complain, unknown method 
if protocol.lower() not in ('http/1.0', 'http/1.1'): 
    # complain, unknown protocol

来源

2009-03-03 12:59:45 bobince

蟒蛇正则表达式麻烦

回答

相关问题