转换XML文件导入CSV

我有一个XML文件看起来像这样：转换XML文件导入CSV

<Organism> 
<Name>Bacillus halodurans C-125</Name> 
    <Enzyme>M.BhaII</Enzyme> 
    <Motif>GGCC</Motif> 
    <Enzyme>M1.BhaI</Enzyme> 
    <Motif>GCATC</Motif> 
    <Enzyme>M2.BhaI</Enzyme> 
    <Motif>GCATC</Motif> 
</Organism> 
<Organism> 
<Name>Bacteroides eggerthii 1_2_48FAA</Name> 
</Organism>

我试图记录到一个CSV文件是这样的：

Bacillus halodurans, GGCC 
Bacillus halodurans, GCATC 
Bacillus halodurans, GCATC 
Bacteriodes,

我走近路这是为了创建一个元组列表，它将organism name和motif放在一起。我想这使用ElementTree模块：

import xml.etree.ElementTree as ET 

tree = ET.parse('file.xml') 
rebase = tree.getroot() 

list = [] 

for organisms in rebase.findall('Organism'): 
     name = organisms.find('Name').text 
     for each_organism in organisms.findall('Motif'): 
      try: 
       motif = organisms.find('Motif').text 
       print name, motif 
      except AttributeError: 
       print name

但是输出我得到这个样子的：

Bacillus halodurans, GGCC 
Bacillus halodurans, GGCC 
Bacillus halodurans, GGCC

只有第一motif被记录下来。这是我第一次与ElementTree合作，所以它有点混乱。任何帮助将不胜感激。

我不需要帮助写入CSV文件。

来源

2014-09-27 Beginner

你需要修复的唯一的事情就是更换：

motif = organisms.find('Motif').text

有：

motif = each_organism.text

您已经通过Motif节点的Organism内迭代。 each_organism循环变量保存的值为Motif标记。

我也会改变变量名称以避免混淆。另外，我没有看到在Motif标签上循环内需要try/except。如果可以有name标签丢失，您可以按照“请求原谅，不许可”的方针，抓住错误：

for organism in rebase.findall('Organism'): 
    try: 
     name = organism.find('Name').text 
    except AttributeError: 
     continue 

    for motif in organism.findall('Motif'): 
     motif = motif.text 
     print name, motif

来源

2014-09-27 20:18:12 alecxe

就像一个魅力。然而，在文件中的某些点上，“生物体”标签下面只有一个“名称”标签。在这种情况下，我得到'AttributeError：'NoneType'对象在'name = organisms.find'（'Name'）行没有属性'text'的错误。' – Beginner 2014-09-27 20:21:48

@Beginner好吧，我已经更新了回答。据我所知，你想跳过没有名字的生物体 - 因为我在'except'块中使用了'continue'。谢谢。 – alecxe 2014-09-27 20:25:48

转换XML文件导入CSV

回答

相关问题