2014-10-01 120 views
2

所以我有一个csv文件,其中每一行表示分层数据的形式: '门','班','订单','家庭','属','物种','亚种','unique_gi'将csv转换为Newick树

我想将其转换为经典的Newick tree format无距离。无论是一种新颖的方法还是一个python包都会很棒。谢谢!

+0

交叉贴:https://www.biostars.org/p/ 114387 – Pierre 2014-10-01 19:52:48

回答

5

您可以使用一些简单的Python从CSV中构建一棵树,然后将它写出到Newick树中。不知道这是你想要做什么或不是。

import csv 
from collections import defaultdict 
from pprint import pprint 

def tree(): return defaultdict(tree) 

def tree_add(t, path): 
    for node in path: 
    t = t[node] 

def pprint_tree(tree_instance): 
    def dicts(t): return {k: dicts(t[k]) for k in t} 
    pprint(dicts(tree_instance)) 

def csv_to_tree(input): 
    t = tree() 
    for row in csv.reader(input, quotechar='\''): 
     tree_add(t, row) 
    return t 

def tree_to_newick(root): 
    items = [] 
    for k in root.iterkeys(): 
     s = '' 
     if len(root[k].keys()) > 0: 
      sub_tree = tree_to_newick(root[k]) 
      if sub_tree != '': 
       s += '(' + sub_tree + ')' 
     s += k 
     items.append(s) 
    return ','.join(items) 

def csv_to_weightless_newick(input): 
    t = csv_to_tree(input) 
    #pprint_tree(t) 
    return tree_to_newick(t) 

if __name__ == '__main__': 
    # see https://docs.python.org/2/library/csv.html to read CSV file 
    input = [ 
     "'Phylum','Class','Order','Family','Genus','Species','Subspecies','unique_gi'", 
     "'Phylum','Class','Order','example'", 
     "'Another','Test'", 
    ] 

    print csv_to_weightless_newick(input) 

输出示例:

$ python ~/tmp/newick_tree.py 
(((example,((((unique_gi)Subspecies)Species)Genus)Family)Order)Class)Phylum,(Test)Another 

此外,该库看起来很酷,让你想象你的树:http://biopython.org/wiki/Phylo

+0

谢谢!很棒。 – 2014-10-02 06:41:22

+0

@MarkWatson,'python newick_tree.py file.csv'是正确的命令行吗? – user3184877 2017-05-25 15:04:26