可能有方法只是使用字符串处理来做到这一点,但我会解析它们并递归地以newick格式打印它们。一个有点最小的实现:
import re
class Tree(object):
def __init__(self, label):
self.label = label
self.children = []
@staticmethod
def _tokenize(string):
return list(reversed(re.findall(r'\(|\)|[^ \n\t()]+', string)))
@classmethod
def from_string(cls, string):
tokens = cls._tokenize(string)
return cls._tree(tokens)
@classmethod
def _tree(cls, tokens):
t = tokens.pop()
if t == '(':
tree = cls(tokens.pop())
for subtree in cls._trees(tokens):
tree.children.append(subtree)
return tree
else:
return cls(t)
@classmethod
def _trees(cls, tokens):
while True:
if not tokens:
raise StopIteration
if tokens[-1] == ')':
tokens.pop()
raise StopIteration
yield cls._tree(tokens)
def to_newick(self):
if self.children and len(self.children) == 1:
return ','.join(child.to_newick() for child in self.children)
elif self.chilren:
return '(' + ','.join(child.to_newick() for child in self.children) + ')'
else:
return self.label
注意,当然,信息会在转换过程中丢失,因为只有终端节点被保留。用法:
>>> s = """(ROOT (..."""
>>> Tree.from_string(s).to_newick()
...
非常感谢你=) –
我复制它几乎原封不动从我的[各种NLP功能的文件(https://github.com/L3viathan/toolib/blob/master/nlp.py) (如果你使用解析树,可能会很有用),只需添加'to_newick'。 – L3viathan
对不起,但我不理解你!我复制你的代码,但它不起作用:( –