2013-03-14 100 views
1

我试图读取通过多个序列提交到NCBI爆炸网站生成的XML文件的列表。从每个文件中,我想打印某些信息。 我想要读取的文件全部给出后缀"_recombination.xml"阅读多个爆炸文件(biopython)

for file in glob.glob("*_recombination.xml"): 
    result_handle= open(file) 
    blast_record=NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct 

脚本首先找到所有与"_recombination.xml"后缀,然后,我希望它读取每个文件和打印某些行的文件(这是几乎从BioPython直副本烹饪书),这似乎去做。但我得到以下错误:

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 202, in <module> 
blast_record=NCBIXML.read(result_handle) 
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read 
first = iterator.next() 
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse 
expat_parser.Parse("", True) # End of XML record 
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594 

我不确定问题出在哪里。我不知道这是否是回过它已经阅读 - 例如文件试图循环,关闭文件似乎帮助:

for file in glob.glob("*_recombination.xml"): 
    result_handle= open(file) 
    blast_record=NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct 
    result_handle.close() 
    blast_record.close() 

但是,这也给了我另一个错误:

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 213, in <module> blast_record.close() 
AttributeError: 'Blast' object has no attribute 'close' 
+0

删除行blast_record.close(),解析的对象没有关闭的方法(这是AttributeError试图告诉你)。 – peterjc 2013-03-14 11:28:47

+0

ExpatError可能是由于破损的XML文件造成的,例如截断的输出。你有没有检查它的眼睛抱怨的具体文件? – peterjc 2013-03-14 11:29:57

回答

2

我通常使用的解析方法,而不是阅读,也许它可以帮助你:

for blast_record in NCBIXML.parse(open(input_xml)): 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct 

,并确保您的xml是在您的查询中生成的-outfmt 5

0

我会为Biogeek的回答添加评论,但我不能(尚未有足够的声望)。在契税他是对的,你应该使用

NCBIXML.parse(open(input_xml)) 

代替NCBIXML.read(开放(input_xml)),因为你是“想读的XML文件的列表”你需要和XML文件列表解析而不是阅读。它解决了你的问题吗?