如何解析代码（使用Python）？

我需要解析一些特殊的数据结构。他们在某种程度上有些类似-C格式，看起来大致是这样的：如何解析代码（使用Python）？

Group("GroupName") { 
    /* C-Style comment */ 
    Group("AnotherGroupName") { 
     Entry("some","variables",0,3.141); 
     Entry("other","variables",1,2.718); 
    } 
    Entry("linebreaks", 
      "allowed", 
      3, 
      1.414 
     ); 
}

我能想到的几种方法去这个问题。我可以使用正则表达式来“代码化”代码。我可以一次读取一个字符的代码，并使用状态机构造我的数据结构。我可以摆脱逗号分隔线并逐行阅读。我可以编写一些转换脚本，将此代码转换为可执行的Python代码。

是否有一个很好的pythonic方式来解析这样的文件？
你会如何解析它？

这更多的是关于如何解析字符串的一般问题，而不是那么多关于这个特定的文件格式。

来源

2011-03-07 bastibe

[本文]（http://nedbatchelder.com/text/python-parsers.html）可能会引起您的兴趣。 –

使用pyparsing（马克Tolonen，我正要点击“提交邮报”当你此帖一通），这是非常简单的 - 见嵌入代码如下意见：

data = """Group("GroupName") { 
    /* C-Style comment */ 
    Group("AnotherGroupName") { 
     Entry("some","variables",0,3.141); 
     Entry("other","variables",1,2.718); 
    } 
    Entry("linebreaks", 
      "allowed", 
      3, 
      1.414 
     ); 
} """ 

from pyparsing import * 

# define basic punctuation and data types 
LBRACE,RBRACE,LPAREN,RPAREN,SEMI = map(Suppress,"{}();") 
GROUP = Keyword("Group") 
ENTRY = Keyword("Entry") 

# use parse actions to do parse-time conversion of values 
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t:float(t[0])) 
integer = Regex(r"[+-]?\d+").setParseAction(lambda t:int(t[0])) 

# parses a string enclosed in quotes, but strips off the quotes at parse time 
string = QuotedString('"') 

# define structure expressions 
value = string | real | integer 
entry = Group(ENTRY + LPAREN + Group(Optional(delimitedList(value)))) + RPAREN + SEMI 

# since Groups can contain Groups, need to use a Forward to define recursive expression 
group = Forward() 
group << Group(GROUP + LPAREN + string("name") + RPAREN + 
      LBRACE + Group(ZeroOrMore(group | entry))("body") + RBRACE) 

# ignore C style comments wherever they occur 
group.ignore(cStyleComment) 

# parse the sample text 
result = group.parseString(data) 

# print out the tokens as a nice indented list using pprint 
from pprint import pprint 
pprint(result.asList())

打印

[['Group', 
    'GroupName', 
    [['Group', 
    'AnotherGroupName', 
    [['Entry', ['some', 'variables', 0, 3.141]], 
    ['Entry', ['other', 'variables', 1, 2.718]]]], 
    ['Entry', ['linebreaks', 'allowed', 3, 1.4139999999999999]]]]]

（不幸的是，可能会有一些混乱，因为pyparsing定义了一个“组”类，赋予结构的解析令牌 - 注意条目中的值列表怎么弄分组，因为列表表达式是pyparsing集团内部封闭）

来源

2011-03-07 09:30:15 PaulMcG

你刚刚在O'Reilly书店赚了10美元！ – bastibe