2013-12-13 48 views
4

我有一个预处理的C文件,我需要枚举其中一个枚举的成员。 pyparsing附带一个简单的例子(examples/cpp_enum_parser.py),但它只适用于枚举值是正整数。在现实生活中,价值可能是负面的,十六进制或复杂的表达。是否可以使用pyparsing解析非平凡的C枚举?

我不需要结构化的值,只是名称。

enum hello { 
    minusone=-1, 
    par1 = ((0,5)), 
    par2 = sizeof("a\\")bc};,"), 
    par3 = (')') 
}; 

当解析值,解析器应该跳过一切,直到[('",}]和处理这些字符。对于那个正则表达式或SkipTo可能会有用。对于字符串和字符 - QuotedString。对于嵌套括号 - 正向(examples/fourFn.py

回答

3

修改了原始示例。我不知道他们为什么从原始脚本中删除enum.ignore(cppStyleComment)。把它放回去。

from pyparsing import * 
# sample string with enums and other stuff 
sample = ''' 
    stuff before 
    enum hello { 
     Zero, 
     One, 
     Two, 
     Three, 
     Five=5, 
     Six, 
     Ten=10, 
     minusone=-1, 
     par1 = ((0,5)), 
     par2 = sizeof("a\\")bc};,"), 
     par3 = (')') 
     }; 
    in the middle 
    enum 
     { 
     alpha, 
     beta, 
     gamma = 10 , 
     zeta = 50 
     }; 
    at the end 
    ''' 

# syntax we don't want to see in the final parse tree 
LBRACE,RBRACE,EQ,COMMA = map(Suppress,"{}=,") 


lpar = Literal("(") 
rpar = Literal(")") 
anything_topl = Regex(r"[^'\"(,}]+") 
anything  = Regex(r"[^'\"()]+") 

expr = Forward() 
pths_or_str = quotedString | lpar + expr + rpar 
expr <<  ZeroOrMore(pths_or_str | anything) 
expr_topl = ZeroOrMore(pths_or_str | anything_topl) 

_enum = Suppress('enum') 
identifier = Word(alphas,alphanums+'_') 
expr_topl_text = originalTextFor(expr_topl) 
enumValue = Group(identifier('name') + Optional(EQ + expr_topl_text('value'))) 
enumList = Group(ZeroOrMore(enumValue + COMMA) + Optional(enumValue)) 
enum = _enum + Optional(identifier('enum')) + LBRACE + enumList('names') + RBRACE 
enum.ignore(cppStyleComment) 

# find instances of enums ignoring other syntax 
for item,start,stop in enum.scanString(sample): 
    for entry in item.names: 
     print('%s %s = %s' % (item.enum,entry.name, entry.value)) 

结果:

$ python examples/cpp_enum_parser.py 
hello Zero = 
hello One = 
hello Two = 
hello Three = 
hello Five = 5 
hello Six = 
hello Ten = 10 
hello minusone = -1 
hello par1 = ((0,5)) 
hello par2 = sizeof("a\")bc};,") 
hello par3 = (')') 
alpha = 
beta = 
gamma = 10 
zeta = 50 
3

你必须特殊情况下,可能包含逗号或右括号不标记枚举值的最终条款。

from pyparsing import * 

sample = r""" 
enum hello { 
    minusone=-1, 
    par1 = ((0,5)), 
    par2 = sizeof("a\")bc};,"), 
    par3 = (')') 
}; 
""" 

ENUM = Keyword("enum") 
LBRACE,RBRACE,COMMA,EQ = map(Suppress, "{},=") 
identifier = Word(alphas+"_", alphanums+"_") 
identifier.setName("identifier")#.setDebug() 

funcCall = identifier + nestedExpr() 

enum_value = nestedExpr() | quotedString | funcCall | SkipTo(COMMA | RBRACE) 

enum_decl = (ENUM + Optional(identifier, '')("ident") + LBRACE + 
    OneOrMore(identifier + Optional(EQ + enum_value).suppress() + Optional(COMMA))("names") + 
    RBRACE 
    ) 

for enum in enum_decl.searchString(sample): 
    print enum.ident, ','.join(enum.names) 

打印

hello minusone,par1,par2,par3 
+0

没有注意到nestedExpr。谢谢 – basin