2016-09-25 43 views

回答

0

可能有方法只是使用字符串处理来做到这一点,但我会解析它们并递归地以newick格式打印它们。一个有点最小的实现:

import re 

class Tree(object): 
    def __init__(self, label): 
     self.label = label 
     self.children = [] 

    @staticmethod 
    def _tokenize(string): 
     return list(reversed(re.findall(r'\(|\)|[^ \n\t()]+', string))) 

    @classmethod 
    def from_string(cls, string): 
     tokens = cls._tokenize(string) 
     return cls._tree(tokens) 

    @classmethod 
    def _tree(cls, tokens): 
     t = tokens.pop() 
     if t == '(': 
      tree = cls(tokens.pop()) 
      for subtree in cls._trees(tokens): 
       tree.children.append(subtree) 
      return tree 
     else: 
      return cls(t) 

    @classmethod 
    def _trees(cls, tokens): 
     while True: 
      if not tokens: 
       raise StopIteration 
      if tokens[-1] == ')': 
       tokens.pop() 
       raise StopIteration 
      yield cls._tree(tokens) 

    def to_newick(self): 
     if self.children and len(self.children) == 1: 
      return ','.join(child.to_newick() for child in self.children) 
     elif self.chilren: 
      return '(' + ','.join(child.to_newick() for child in self.children) + ')' 
     else: 
      return self.label 

注意,当然,信息会在转换过程中丢失,因为只有终端节点被保留。用法:

>>> s = """(ROOT (...""" 
>>> Tree.from_string(s).to_newick() 
... 
+0

非常感谢你=) –

+0

我复制它几乎原封不动从我的[各种NLP功能的文件(https://github.com/L3viathan/toolib/blob/master/nlp.py) (如果你使用解析树,可能会很有用),只需添加'to_newick'。 – L3viathan

+0

对不起,但我不理解你!我复制你的代码,但它不起作用:( –

相关问题