解析XML Python

我正在使用xml.etree.ElementTree解析XML文件。我有个问题。我不知道如何获得标签之间的纯文本行。解析XML Python

<Sync time="4.496"/> 
<Background time="4.496" type="music" level="high"/> 

<Event desc="pause" type="noise" extent="instantaneous"/> 
Plain text 
<Sync time="7.186"/> 

<Event desc="b" type="noise" extent="instantaneous"/> 
Plain text 
<Sync time="10.949"/> 
Plain text

我有这样的代码已经：

import xml.etree.ElementTree as etree 
import os 

data_file = "./file.xml" 

xmlD = etree.parse(data_file) 
root = xmlD.getroot() 
sections = root.getchildren()[2].getchildren() 
for section in sections: 
    turns = section.getchildren() 
    for turn in turns: 
     speaker = turn.get('speaker') 
    mode = turn.get('mode') 
    childs = turn.getchildren() 

     for child in childs: 
      time = child.get('time') 
      opt = child.get('desc') 
      if opt == 'es': 
       opt = "ESP:" 
      elif opt == "la": 
       opt = "LATIN:" 
      elif opt == "*": 
       opt = "-ININT-" 
      elif opt == "fs": 
       opt = "-FS-" 
      elif opt == "throat": 
       opt = "-THROAT-" 
      elif opt == "laugh": 
       opt = "-LAUGH-" 
      else: 
       opt = "" 

      print speaker, mode, time, opt+child.tail.encode('latin-1')

我可以通过XML访问，直到同步|背景|事件标签，而不能提取这些标签后的文本。我放了一段XML文件，没有整个文件。我只有最后一段代码有问题

非常感谢你@alecxe。现在我可以得到我需要的信息。但现在我有一个新的小问题。我获得行键入tail命令，但之前或类似的东西产生\n一个换行符，所以，我需要的是这样的： spk1 planned LAN: Plain text from tail>

但我得到这个：

spk1 planned LAN: Plain text from tail

我有尝试了很多东西，re.match()模块，sed命令处理完XML后，似乎没有\n换行符，但我不能“贴”纯文本！预先感谢您

有人吗？谢谢！

来源

2015-05-11 Sergi

这就是所谓的tail of an element：

尾部属性可以用于将与该元素相关联的附加数据。该属性通常是一个字符串，但可以是任何应用程序特定的对象。如果该元素是从XML 文件创建的，则该属性将包含在元素结束标记之后和下一个标记之前找到的任何文本。

找到Event标签并获得尾巴，例如：

section.find("Event").tail

来源

2015-05-11 11:27:10 alecxe

回答

相关问题