阅读多个爆炸文件（biopython）

我试图读取通过多个序列提交到NCBI爆炸网站生成的XML文件的列表。从每个文件中，我想打印某些信息。我想要读取的文件全部给出后缀"_recombination.xml"。阅读多个爆炸文件（biopython）

for file in glob.glob("*_recombination.xml"): 
    result_handle= open(file) 
    blast_record=NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct

脚本首先找到所有与"_recombination.xml"后缀，然后，我希望它读取每个文件和打印某些行的文件（这是几乎从BioPython直副本烹饪书），这似乎去做。但我得到以下错误：

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 202, in <module> 
blast_record=NCBIXML.read(result_handle) 
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read 
first = iterator.next() 
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse 
expat_parser.Parse("", True) # End of XML record 
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594

我不确定问题出在哪里。我不知道这是否是回过它已经阅读 - 例如文件试图循环，关闭文件似乎帮助：

for file in glob.glob("*_recombination.xml"): 
    result_handle= open(file) 
    blast_record=NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct 
    result_handle.close() 
    blast_record.close()

但是，这也给了我另一个错误：

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 213, in <module> blast_record.close() 
AttributeError: 'Blast' object has no attribute 'close'

来源

2013-03-14 user2168818

删除行blast_record.close（），解析的对象没有关闭的方法（这是AttributeError试图告诉你）。 – peterjc 2013-03-14 11:28:47

ExpatError可能是由于破损的XML文件造成的，例如截断的输出。你有没有检查它的眼睛抱怨的具体文件？ – peterjc 2013-03-14 11:29:57

我通常使用的解析方法，而不是阅读，也许它可以帮助你：

for blast_record in NCBIXML.parse(open(input_xml)): 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct

，并确保您的xml是在您的查询中生成的-outfmt 5

来源

2014-08-13 18:16:18 Biosys

我会为Biogeek的回答添加评论，但我不能（尚未有足够的声望）。在契税他是对的，你应该使用

NCBIXML.parse(open(input_xml))

代替NCBIXML.read（开放（input_xml）），因为你是“想读的XML文件的列表”你需要和XML文件列表解析而不是阅读。它解决了你的问题吗？

来源

2016-02-02 13:22:30 Cat

阅读多个爆炸文件（biopython）

回答

相关问题