如何在目录中的多个文件上传递Biopython SeqIO.convert（）？

我正在写一个python脚本（版本2.7），它将把指定目录内的每个输入文件（.nexus格式）变成.fasta格式。 Biopython模块SeqIO.convert完全处理单个指定文件的转换，但当我尝试使用os.walk在目录上自动执行进程时，我无法将每个输入文件的路径名正确传递到SeqIO.convert。我哪里错了？我是否需要使用os.path模块中的join（）并将完整的路径名传递给SeqIO.convert？如何在目录中的多个文件上传递Biopython SeqIO.convert（）？

#Import modules 
    import sys 
    import re 
    import os 
    import fileinput 

    from Bio import SeqIO 

    #Specify directory of interest 
    PSGDirectory = "/Users/InputDirectory” 
    #Create a class that will run the SeqIO.convert function repeatedly 
    def process(filename): 
     count = SeqIO.convert("files", "nexus", "files.fa", "fasta", alphabet= IUPAC.ambiguous_dna) 
    #Make sure os.walk works correctly 
    for path, dirs, files in os.walk(PSGDirectory): 
     print path 
     print dirs 
     print files 

    #Now recursively do the count command on each file inside PSGDirectory 
    for files in os.walk(PSGDirectory): 
     print("Converted %i records" % count) 
     process(files)

当我运行该脚本，我得到这个错误信息： Traceback (most recent call last): File "nexus_to_fasta.psg", line 45, in <module> print("Converted %i records" % count) NameError: name 'count' is not defined This conversation是非常有益的，但我不知道在哪里插入连接（）函数语句。 Here is an example of one of my nexus files 感谢您的帮助！

来源

2014-02-13 PGilbert

有几件事情正在进行。

首先，你的过程函数没有返回'count'。你可能想：

def process(filename): 
    return seqIO.convert("files", "nexus", "files.fa", "fasta", alphabet=IUPAC.ambiguous_dna) 
    # assuming seqIO.convert actually returns the number you want

而且，当你写for files in os.walk(PSGDirectory)你在3元组os.walk的回报，而不是单个文件操作。你想这样做（注意使用os.path.join的）：

for root, dirs, files in os.walk(PSGDirectory): 
    for filename in files: 
      fullpath = os.path.join(root, filename) 
      print process(fullpath)

更新：

所以我看着为seqIO.convert的文档，并期待与被称为：

in_file中 - 一个输入手柄或文件名
in_format - 输入文件格式，小写字符串
out_file - 输出手柄或文件名
out_format - 输出文件格式，小写字母串
字母 - 可选字母承担

in_file中是要转换的文件的名称，原来你只是打电话seqIO.convert以“文件” 。

所以你的过程中的作用或许应该是这样的：

def process(filename): 
    return seqIO.convert(filename, "nexus", filename + '.fa', "fasta", alphabet=IUPAC.ambiguous_dna)

来源

2014-02-13 04:08:22 celeritas

顺便说一句，如果用法类似于SeqIO不返回任何东西（它只是转换一个文件，也许），你可以只写程序（FULLPATH ），同时知道len（文件）会告诉你你处理了多少文件。 – celeritas

这非常有帮助！但我仍然得到这个错误：'回溯（最近通话最后一个）：文件“nexus_to_fasta.psg” 41行，在印刷工艺（FULLPATH）文件“nexus_to_fasta.psg” 35行，在过程回报SeqIO.convert（“files”，“nexus”，“files.fa”，“fasta”，alphabet = IUPAC.ambiguous_dna） File“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ site_packages/Bio/SeqIO/__init__.py“，第899行，转换为 in_handle = open（in_file，”rU“） IOError：[Errno 2]没有这样的文件或目录：'files'' – PGilbert

从你调用convert的方式;在查看seqIO.convert的文档后，我更新了答案。 – celeritas

如何在目录中的多个文件上传递Biopython SeqIO.convert（）？

回答

相关问题