2016-06-20 25 views
1

我在Python 3.5.0上使用pyparsing(2.1.5)。 我想更快地制作infixNotation。我发现其他人使用ParserElement.enablePackrat()来提高infixNotation的性能。但我无法做到。我的代码如下。ParserElement.enablePackrat()不会使infixNotation更快

from pyparsing import * 
ParserElement.enablePackrat() 
UNICODE_CHARS = u''.join(
    chr(c) for c in range(65538) if not chr(c).isspace() and 
    chr(c) not in '()"' 
) 
_and_ = Keyword('AND') 
_or_ = Keyword('OR') 
_not_ = Keyword('NOT') 
search_term = ~_and_ + ~_or_ + ~_not_ + Word(UNICODE_CHARS) | QuotedString(
    '"', endQuoteChar='"', unquoteResults=False 
) 
search_expr = infixNotation(
    search_term, 
    [ 
     (_not_, 1, opAssoc.RIGHT), 
     (Optional(_and_), 2, opAssoc.LEFT), 
     (_or_, 2, opAssoc.LEFT), 
    ] 
) 
parsed_query = search_expr.parseString(user_string)[0].asList() 

回答

0

此使用的infixNotation只有3个级别的运营商,所以packratting不会为你做很多。这些改进通常有5个或更多级别的运算符,例如算术运算。

如果你真的想曲柄表现出来的infixNotation,写自己的简装版:

""" 
BNF 

operand = ~and ~or ~not (A-Za-z0-9)... | quoted_string 

atom = 'not'? (operand | '(' expr ')') 
and_term = atom 'and' atom 
or_term = and_term 'or' and_term 
""" 


_and_ = CaselessKeyword('AND') 
_or_ = CaselessKeyword('OR') 
_not_ = CaselessKeyword('NOT') 
keyword = (_and_ | _or_ | _not_) 
search_term = ~keyword + Word(UNICODE_CHARS) | QuotedString('"', endQuoteChar='"', unquoteResults=False) 

# use this instead of infixNotation - this is essentially what infixNotation will 
# generate, but with fewer FollowedBy's (used to collapse degenerate levels) 
LPAR,RPAR = map(Suppress, "()") 
expr = Forward() 
atom_ = search_term | Group(LPAR + expr + RPAR) 
atom = Group(_not_ + atom_) | atom_ 
and_term = Group(atom + ZeroOrMore(_and_ + atom)) 
or_term = Group(and_term + ZeroOrMore(_or_ + and_term)) 
expr <<= or_term 

# some simple test cases 
tests = """ 
    p and not q 
    p and not q or r 
    p and not (q or r) 
""" 

print("compare with using infixNotation") 
expr.runTests(tests) 

print("compare with using infixNotation") 
search_expr = infixNotation(
    search_term, 
    [ 
     (_not_, 1, opAssoc.RIGHT), 
     (Optional(_and_), 2, opAssoc.LEFT), 
     (_or_, 2, opAssoc.LEFT), 
    ] 
) 

search_expr.runTests(tests) 

硬编码版本会给输出,如:

p and not q 
[[['p', 'AND', ['NOT', 'q']]]] 
[0]: 
    [['p', 'AND', ['NOT', 'q']]] 
    [0]: 
    ['p', 'AND', ['NOT', 'q']] 
    [0]: 
     p 
    [1]: 
     AND 
    [2]: 
     ['NOT', 'q'] 


p and not q or r 
[[['p', 'AND', ['NOT', 'q']], 'OR', ['r']]] 
[0]: 
    [['p', 'AND', ['NOT', 'q']], 'OR', ['r']] 
    [0]: 
    ['p', 'AND', ['NOT', 'q']] 
    [0]: 
     p 
    [1]: 
     AND 
    [2]: 
     ['NOT', 'q'] 
    [1]: 
    OR 
    [2]: 
    ['r'] 


p and not (q or r) 
[[['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]]]] 
[0]: 
    [['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]]] 
    [0]: 
    ['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]] 
    [0]: 
     p 
    [1]: 
     AND 
    [2]: 
     ['NOT', [[['q'], 'OR', ['r']]]] 
     [0]: 
     NOT 
     [1]: 
     [[['q'], 'OR', ['r']]] 
     [0]: 
      [['q'], 'OR', ['r']] 
      [0]: 
      ['q'] 
      [1]: 
      OR 
      [2]: 
      ['r'] 

使用中缀表示法将给:

p and not q 
[['p', 'AND', ['NOT', 'q']]] 
[0]: 
    ['p', 'AND', ['NOT', 'q']] 
    [0]: 
    p 
    [1]: 
    AND 
    [2]: 
    ['NOT', 'q'] 


p and not q or r 
[[['p', 'AND', ['NOT', 'q']], 'OR', 'r']] 
[0]: 
    [['p', 'AND', ['NOT', 'q']], 'OR', 'r'] 
    [0]: 
    ['p', 'AND', ['NOT', 'q']] 
    [0]: 
     p 
    [1]: 
     AND 
    [2]: 
     ['NOT', 'q'] 
    [1]: 
    OR 
    [2]: 
    r 


p and not (q or r) 
[['p', 'AND', ['NOT', ['q', 'OR', 'r']]]] 
[0]: 
    ['p', 'AND', ['NOT', ['q', 'OR', 'r']]] 
    [0]: 
    p 
    [1]: 
    AND 
    [2]: 
    ['NOT', ['q', 'OR', 'r']] 
    [0]: 
     NOT 
    [1]: 
     ['q', 'OR', 'r'] 

FollowedBy terms that infixNotation通过在实际分组之前确保有2个或更多术语进行分组,可以添加折叠退化级别。它们还阻止了操作优先级定义中每个级别的原子的调用解析操作。

如果这些对您无关紧要,请尝试精简版。 (另外,请在UNICODE_CHARS的定义上做一点时间 - 这个字符串会产生一些费时,你可能想要预先生成这个字符串到一个单独的模块,并且只导入它。)

+0

谢谢您的建议。我可以让它快3/4。但是我需要让它快1/10。我会尝试psyco什么的.. –

+0

psyco不活着。 pypy在Python3.5上不可用。 –

+0

我将语法定义移出func作用域。性能改进enoght。它是1/20。谢谢! –