使用lxml解析XML时,出现错误“读取文件对象必须返回字节对象”。下面的代码解析lxml时出错
from lxml import etree
from io import StringIO
def parseXML(xmlFile):
"""
parse the xml
"""
data=open(xmlFile)
xml=data.read()
data.close()
tree=etree.parse(StringIO(xml))
context=etree.iterparse(StringIO(xml))
for action, elem in context:
if not elem.text:
if not elem.text:
text="None"
else:
text=elem.text
print(elem.tag + "=>" + text)
if __name__ == "__main__":
parseXML("C:\\Users\\karthik\Desktop\\xml_path\\bgm.xml")
BGM XML
<?xml version="1.0" ?>
<zAppointments reminder="15">
<appointment>
<begin>1181251680</begin>
<uid>040000008200E000</uid>
<alarmTime>1181572063</alarmTime>
<state></state>
<location></location>
<duration>1800</duration>
<subject>Bring pizza home</subject>
</appointment>
<appointment>
<begin>1234360800</begin>
<duration>1800</duration>
<subject>Check MS Office website for updates</subject>
<location></location>
<uid>604f4792-eb89-478b-a14f-dd34d3cc6c21-1234360800</uid>
<state>dismissed</state>
</appointment>
</zAppointments>
错误:
Traceback (most recent call last):
File "C:/Users/karthik/source/ChartAttributes/crecords", line 34, in <module>
parseXML("C:\\Users\\karthik\\Desktop\\xml_path\\bgm.xml")
File "C:/Users/karthik/source/ChartAttributes/crecords", line 26, in parseXML
for action, elem in context:
File "src\lxml\iterparse.pxi", line 208, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:150010)
File "src\lxml\iterparse.pxi", line 193, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:149708)
File "src\lxml\iterparse.pxi", line 221, in lxml.etree.iterparse._read_more_events (src\lxml\lxml.etree.c:150208)
TypeError: reading file objects must return bytes objects
过程与退出代码完成1
任何你不直接执行'xml = etree.parse(xmlFile)'而不是将文件读入字符串然后用StringIO封装的原因? –
我只是跟着这个博客https://www.blog.pythonlibrary.org/2010/11/20/python-parsing-xml-with-lxml/ – karthik
好吧...以及尝试直接使用'etree.parse'文件名并查看会发生什么 –