解析HTTP请求授权头与Python

我需要一个标题是这样的：解析HTTP请求授权头与Python

Authorization: Digest qop="chap", 
    realm="[email protected]", 
    username="Foobear", 
    response="6629fae49393a05397450978507c4ef1", 
    cnonce="5ccc069c403ebaf9f0171e9517f40e41"

并解析它这个使用Python：

{'protocol':'Digest', 
    'qop':'chap', 
    'realm':'[email protected]', 
    'username':'Foobear', 
    'response':'6629fae49393a05397450978507c4ef1', 
    'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}

是否有一个图书馆要做到这一点，或者我可以从中寻找灵感？

我在Google App Engine上这样做，我不确定Pyparsing库是否可用，但是如果它是最佳解决方案，也许我可以将它包含在我的应用程序中。

目前我正在创建自己的MyHeaderParser对象，并在头字符串中使用reduce（）。它正在工作，但非常脆弱。通过下面纳迪亚

辉煌的解决方案：

import re 

reg = re.compile('(\w+)[=] ?"?(\w+)"?') 

s = """Digest 
realm="stackoverflow.com", username="kixx" 
""" 

print str(dict(reg.findall(s)))

来源

2009-08-28 Kris Walker

到目前为止，解决方案哈事实证明它只是超级干净，但也非常强大。尽管不是RFC的最“靠书”实现，但我还没有构建一个返回无效值的测试用例。然而，我只用它来解析授权头，我感兴趣的其他头文件的随机数需要解析，所以这可能不是一个很好的解决方案，因为它是一个通用的HTTP头解析器。 – 2009-09-04 11:35:52

小的正则表达式：

import re 
reg=re.compile('(\w+)[:=] ?"?(\w+)"?') 

>>>dict(reg.findall(headers)) 

{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}

来源

2009-08-28 21:40:19

哇，我爱Python。 “授权：”实际上并不是标题字符串的一部分，所以我改为：＃！/usr/bin/env python import re def mymain（）： reg = re.compile（'（\ w +）[=]？“？（\ w +）”？'） s =“”“摘要境界= “fireworksproject.com”，用户名= “的Kristoffer” “”” 打印STR（字典（reg.findall（S）））如果__name__ == '__main__'： mymain（）我没有得到“摘要”协议声明，但我不需要它。基本上3行代码...辉煌！ – 2009-08-28 21:56:59

我认为这会更明确地使用原始字符串或\\。 – 2009-08-28 22:04:05

如果你觉得这和使用它，一定要加'内的另一个问号“？（\ w +）”'所以它成为'？“（\ w +）？”'这样，如果你沿东西传为“”它返回参数并且该值未定义。如果你真的想摘要：'/（\ w +）（？：（？： “？（\ w +）”？[=]））？/'检查，看是否'='在比赛中存在若然它的一个关键：价值，否则它是别的。 – Nijikokun 2013-04-03 00:06:47

如果这些组件将永远在那里，然后一个正则表达式会做的伎俩：

test = '''Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"''' 

import re 

re_auth = re.compile(r""" 
    Authorization:\s*(?P<protocol>[^ ]+)\s+ 
    qop="(?P<qop>[^"]+)",\s+ 
    realm="(?P<realm>[^"]+)",\s+ 
    username="(?P<username>[^"]+)",\s+ 
    response="(?P<response>[^"]+)",\s+ 
    cnonce="(?P<cnonce>[^"]+)" 
    """, re.VERBOSE) 

m = re_auth.match(test) 
print m.groupdict()

生产：

{ 'username': 'Foobear', 
    'protocol': 'Digest', 
    'qop': 'chap', 
    'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 
    'realm': '[email protected]', 
    'response': '6629fae49393a05397450978507c4ef1' 
}

来源

2009-08-28 21:36:41

就我所能看到的情况而言，此解决方案可产生正确的结果。 – 2009-09-04 11:59:19

我会建议找到一个解析http头的正确库，不幸的是我无法重新加载任何。 :(

有一段时间检查下面的代码段（应该主要工作）：

input= """ 
Authorization: Digest qop="chap", 
    realm="[email protected]", 
    username="Foob,ear", 
    response="6629fae49393a05397450978507c4ef1", 
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" 
""" 

field, sep, value = input.partition(":") 
if field.endswith('Authorization'): 
    protocol, sep, opts_str = value.strip().partition(" ") 

    opts = {} 
    for opt in opts_str.split(",\n"): 
     key, value = opt.strip().split('=') 
     key = key.strip(" ") 
     value = value.strip(' "') 
     opts[key] = value 

    opts['protocol'] = protocol 

    print opts

来源

2009-08-28 21:38:11

如果你的反应是在一个单一的字符串，该从来没有变化，对其有表达式，有尽可能多的行比赛，你可以把它拆分成新行的数组称为authentication_array和使用正则表达式：

pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce'] 
i = 0 
parsed_dict = {} 

for line in authentication_array: 
    pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern 
    match = re.search(re.compile(pattern), line)   # make the match 
    if match: 
     parsed_dict[match.group(1)] = match.group(2) 
    i += 1

来源

2009-08-28 21:38:47 Pinochle

您使用PyParsing的原始概念将是最好的方法。隐含地要求的是需要语法的东西......也就是说，正则表达式或简单的解析例程总是会变得脆弱，这听起来像是你想要避免的东西。

看来，越来越pyparsing在谷歌应用程序引擎是很容易：How do I get PyParsing set up on the Google App Engine?

所以我与去，然后实现从RFC2617的完整的HTTP认证/授权头的支持。

来源

2009-08-28 21:42:40

我决定采取这一做法，并试图实现使用RFC规范Authorization头一个完全兼容的解析器。这个任务显得更加艰巨比我anticpated。您的选择简单的正则表达式，而不是严格的正确性，可能是最好的务实的解决方案。我马上汇报这里，如果我最终得到一个全功能的首部解析器。 – 2009-08-29 16:27:01

是的，这将是很好看更严格的更正 – 2009-09-04 12:01:49

您也可以使用urllib2作为CheryPy。

这里的片段：

input= """ 
Authorization: Digest qop="chap", 
    realm="[email protected]", 
    username="Foobear", 
    response="6629fae49393a05397450978507c4ef1", 
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" 
""" 
import urllib2 
field, sep, value = input.partition("Authorization: Digest ") 
if value: 
    items = urllib2.parse_http_list(value) 
    opts = urllib2.parse_keqv_list(items) 
    opts['protocol'] = 'Digest' 
    print opts

它输出：

{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': '[email protected]', 'response': '6629fae49393a05397450978507c4ef1'}

来源

2009-08-28 22:11:31

这里是我的pyparsing尝试：

text = """Authorization: Digest qop="chap", 
    realm="[email protected]",  
    username="Foobear",  
    response="6629fae49393a05397450978507c4ef1",  
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" """ 

from pyparsing import * 

AUTH = Keyword("Authorization") 
ident = Word(alphas,alphanums) 
EQ = Suppress("=") 
quotedString.setParseAction(removeQuotes) 

valueDict = Dict(delimitedList(Group(ident + EQ + quotedString))) 
authentry = AUTH + ":" + ident("protocol") + valueDict 

print authentry.parseString(text).dump()

它打印：

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', '[email protected]'], 
['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']] 
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41 
- protocol: Digest 
- qop: chap 
- realm: [email protected] 
- response: 6629fae49393a05397450978507c4ef1 
- username: Foobear

我不熟悉RFC，但我希望这能让你滚动。

来源

2009-09-04 09:40:06 PaulMcG

这个解决方案是使用pypars这是我原本想的，据我所知，它会产生很好的结果。 – 2009-09-04 12:00:35

http摘要授权标头字段是一个奇怪的野兽。它的格式与rfc 2616的Cache-Control和Content-Type头域相似，但只是不相同而已。如果您仍然在寻找一个比正则表达式更智能，更可读的库，您可以尝试使用str.split()来移除授权：摘要部分，然后使用从Werkzeug的http模块解析其余部分。（Werkzeug可以安装在App Engine上。）

来源

2010-05-14 00:13:46

非常感谢。我可以用这个替换那个正则表达式。它似乎更强大。 – 2010-05-14 18:26:46

Nadia的正则表达式仅匹配参数值的字母数字字符。这意味着它不能解析至少两个字段。也就是说，uri和qop。根据RFC 2617，uri字段是请求行中字符串的副本（即HTTP请求的第一行）。如果由于非字母数字“ - ”而导致值为“auth-int”，则qop无法正确解析。

此修改后的正则表达式允许URI（或任何其他值）包含''（空格），''''（qoute）或'，'（逗号）之外的任何内容，这可能比它需要的更宽容，但不应该引起正确形成HTTP请求的任何问题

reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')

特别提示：。从那里，这是相当简单的示例代码在RFC-2617转换到Python使用Python的MD5 API， “MD5Init（）”变为“m = md5.new（）”，“MD5Update（）”变为“m.update（）”，并且“MD5Final（）”变为“m.digest（）”。

来源

2011-09-13 15:09:58

解析HTTP请求授权头与Python

回答

相关问题