我正在使用xml.etree.ElementTree
解析XML文件。我有个问题。我不知道如何获得标签之间的纯文本行。解析XML Python
<Sync time="4.496"/>
<Background time="4.496" type="music" level="high"/>
<Event desc="pause" type="noise" extent="instantaneous"/>
Plain text
<Sync time="7.186"/>
<Event desc="b" type="noise" extent="instantaneous"/>
Plain text
<Sync time="10.949"/>
Plain text
我有这样的代码已经:
import xml.etree.ElementTree as etree
import os
data_file = "./file.xml"
xmlD = etree.parse(data_file)
root = xmlD.getroot()
sections = root.getchildren()[2].getchildren()
for section in sections:
turns = section.getchildren()
for turn in turns:
speaker = turn.get('speaker')
mode = turn.get('mode')
childs = turn.getchildren()
for child in childs:
time = child.get('time')
opt = child.get('desc')
if opt == 'es':
opt = "ESP:"
elif opt == "la":
opt = "LATIN:"
elif opt == "*":
opt = "-ININT-"
elif opt == "fs":
opt = "-FS-"
elif opt == "throat":
opt = "-THROAT-"
elif opt == "laugh":
opt = "-LAUGH-"
else:
opt = ""
print speaker, mode, time, opt+child.tail.encode('latin-1')
我可以通过XML访问,直到同步|背景|事件标签,而不能提取这些标签后的文本。我放了一段XML文件,没有整个文件。我只有最后一段代码有问题
非常感谢你@alecxe。现在我可以得到我需要的信息。但现在我有一个新的小问题。我获得行键入tail
命令,但之前或类似的东西产生\n
一个换行符,所以,我需要的是这样的: spk1 planned LAN: Plain text from tail
>
但我得到这个:
spk1 planned LAN: Plain text from tail
我有尝试了很多东西,re.match()
模块,sed
命令处理完XML后,似乎没有\n
换行符,但我不能“贴”纯文本!预先感谢您
有人吗?谢谢!