2009-10-29 82 views
1

我有一个行的文件像最好的办法

account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that" 

没有特殊的分隔符,每个键有由双引号包围的值,如果它是一个字符串,但不是,如果它是一个数字。没有价值的键虽然可能存在空白字符串表示为“”,并且没有转义字符的报价,因为它不是必需的

我想知道什么是一种很好的解析方法与python一致并将值存储为字典中的键值对

回答

10

我们将需要一个正则表达式。

import re, decimal 
r= re.compile('([^ =]+) *= *("[^"]*"|[^ ]*)') 

d= {} 
for k, v in r.findall(line): 
    if v[:1]=='"': 
     d[k]= v[1:-1] 
    else: 
     d[k]= decimal.Decimal(v) 

>>> d 
{'account': 'TEST1', 'subject': 'some value', 'values': '3=this, 4=that', 'price': Decimal('20.11'), 'Qty': Decimal('100.0')} 

如果您愿意,可以使用浮点数而不是小数点,但如果涉及金钱可能是个坏主意。

+0

您能解释一下这个正则表达式吗? – ash 2011-03-02 01:32:23

0

bobince的的递归变化解析值与嵌入式等于辞书:

>>> import re 
>>> import pprint 
>>> 
>>> def parse_line(line): 
...  d = {} 
...  a = re.compile(r'\s*(\w+)\s*=\s*("[^"]*"|[^ ,]*),?') 
...  float_re = re.compile(r'^\d.+$') 
...  int_re = re.compile(r'^\d+$') 
...  for k,v in a.findall(line): 
...    if int_re.match(k): 
...      k = int(k) 
...    if v[-1] == '"': 
...      v = v[1:-1] 
...    if '=' in v: 
...      d[k] = parse_line(v) 
...    elif int_re.match(v): 
...      d[k] = int(v) 
...    elif float_re.match(v): 
...      d[k] = float(v) 
...    else: 
...      d[k] = v 
...  return d 
... 
>>> line = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values= 
"3=this, 4=that"' 
>>> pprint.pprint(parse_line(line)) 
{'Qty': 100, 
'account': 'TEST1', 
'price': 20.109999999999999, 
'subject': 'some value', 
'values': {3: 'this', 4: 'that'}} 
0

如果你不想使用正则表达式,另一种选择是只为了一次读取字符串的字符:

string = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"' 

inside_quotes = False 
key = None 
value = "" 
dict = {} 

for c in string: 
    if c == '"': 
     inside_quotes = not inside_quotes 
    elif c == '=' and not inside_quotes: 
     key = value 
     value = '' 
    elif c == ' ': 
     if inside_quotes: 
      value += ' '; 
     elif key and value: 
      dict[key] = value 
      key = None 
      value = '' 
    else: 
     value += c 

dict[key] = value 
print dict 
5

也许有点简单遵循的是pyparsing再现:

from pyparsing import * 

# define basic elements - use re's for numerics, faster than easier than 
# composing from pyparsing objects 
integer = Regex(r'[+-]?\d+') 
real = Regex(r'[+-]?\d+\.\d*') 
ident = Word(alphanums) 
value = real | integer | quotedString.setParseAction(removeQuotes) 

# define a key-value pair, and a configline as one or more of these 
# wrap configline in a Dict so that results are accessible by given keys 
kvpair = Group(ident + Suppress('=') + value) 
configline = Dict(OneOrMore(kvpair)) 

src = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" ' \ 
     'values="3=this, 4=that"' 

configitems = configline.parseString(src) 

现在您可以使用返回的配置项ParseResults对象访问您的作品:

>>> print configitems.asList() 
[['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'], 
['subject', 'some value'], ['values', '3=this, 4=that']] 

>>> print configitems.asDict() 
{'account': 'TEST1', 'Qty': '100', 'values': '3=this, 4=that', 
    'price': '20.11', 'subject': 'some value'} 

>>> print configitems.dump() 
[['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'], 
['subject', 'some value'], ['values', '3=this, 4=that']] 
- Qty: 100 
- account: TEST1 
- price: 20.11 
- subject: some value 
- values: 3=this, 4=that 

>>> print configitems.keys() 
['account', 'subject', 'values', 'price', 'Qty'] 

>>> print configitems.subject 
some value