2012-10-17 65 views
6

我正在从使用python的电子表格xml构建公式参考图。公式就像解析excel样式公式

=IF(AND(LEN(R[-2]C[-1])>0,R[-1]C),WriteCurve(OFFSET(R16C6, 0,0,R9C7,R10C7),R15C6,R10C3, R8C3),"NONE") 

我只想获得writecurve函数的第n个参数。在这里我出现了非常C风格的程序,基本上算不上内部括号。有很多嵌套公式

def parseArguments(t, func, n): 
start=t.find(func)+len(func)+1 
bracket = 0 
ss = t[start:] 
lastcomma = 0 
for i, a in enumerate(ss): 
    if a=="(": 
     bracket +=1 
    elif a==")": 
     if bracket==0: 
      break 
     bracket-=1 
    elif a == ",": 
     if bracket==0 and n==0: 
      break 
     elif bracket ==0: 
      if n-1==0: 
       lastcomma = i 
      n-=1 
if lastcomma == 0: 
    return ss[:i] 
else: 
    return ss[lastcomma+1:i] 

是否有pythonic方式做到这一点?还是有更好的递归方式来解析整个公式?非常感谢

回答

8

我知道的最好的Excel公式分析器是E. W. Bachtal's algorithm。 Robin Macharg有一个Python端口;我知道的最新版本是pycel project的一部分,但它可以单独使用 - tokenizer。解析你的公式没有问题:

from tokenizer import shunting_yard 
rpn = shunting_yard('=IF(AND(LEN(R[-2]C[-1])>0,R[-1]C),WriteCurve(OFFSET(R16C6, 0,0,R9C7,R10C7),R15C6,R10C3, R8C3),"NONE")') 
print(rpn) 
deque([<tokenizer.RangeNode object at 0x2b7b1f5d7850>, <tokenizer.FunctionNode object at 0x2b7b1f5d7950>, <tokenizer.ASTNode object at 0x2b7b1f5d7990>, <tokenizer.ASTNode object at 0x2b7b1f5d79d0>, <tokenizer.RangeNode object at 0x2b7b1f5d7a10>, <tokenizer.FunctionNode object at 0x2b7b1f5d7a50>, <tokenizer.RangeNode object at 0x2b7b1f5d7a90>, <tokenizer.ASTNode object at 0x2b7b1f5d7ad0>, <tokenizer.ASTNode object at 0x2b7b1f5d7b10>, <tokenizer.RangeNode object at 0x2b7b1f5d7b50>, <tokenizer.RangeNode object at 0x2b7b1f5d7b90>, <tokenizer.FunctionNode object at 0x2b7b1f5d7bd0>, <tokenizer.RangeNode object at 0x2b7b1f5d7c10>, <tokenizer.RangeNode object at 0x2b7b22efc450>, <tokenizer.RangeNode object at 0x2b7b22efc510>, <tokenizer.FunctionNode object at 0x2b7b22efc410>, <tokenizer.ASTNode object at 0x2b7b22eff110>, <tokenizer.FunctionNode object at 0x2b7b22eff150>]) 

令牌生成器给你留下一个RPN栈;如果你会发现与AST更方便,你可以很容易地转换为AST工作:

def rpn_to_ast(rpn): 
    stack = [] 
    for n in rpn: 
     num_args = (2 if n.token.ttype == "operator-infix" else 
        1 if n.token.ttype.startswith('operator') else 
        n.num_args if n.token.ttype == 'function' else 0) 
     n.args = [stack.pop() for _ in range(num_args)][::-1] 
     stack.append(n) 
    return stack[0] 

然后你可以走AST找到WriteCurve节点,并检查其参数:

def walk(ast): 
    yield ast 
    for arg in getattr(ast, 'args', []): 
     for node in walk(arg): 
      yield node 

write_curve = next(node for node in walk(rpn_to_ast(rpn)) if node.token.ttype == 'function' and node.token.tvalue == 'WriteCurve') 
print(write_curve.args[2].token.tvalue) 
R10C3