解析lxml时出错

使用lxml解析XML时，出现错误“读取文件对象必须返回字节对象”。下面的代码解析lxml时出错

from lxml import etree 
from io import StringIO 
def parseXML(xmlFile): 
    """ 
    parse the xml 
    """ 
    data=open(xmlFile) 
    xml=data.read() 
    data.close() 

    tree=etree.parse(StringIO(xml)) 
    context=etree.iterparse(StringIO(xml)) 
    for action, elem in context: 
     if not elem.text: 
      if not elem.text: 
       text="None" 
      else: 
       text=elem.text 
      print(elem.tag + "=>" + text) 
if __name__ == "__main__": 
    parseXML("C:\\Users\\karthik\Desktop\\xml_path\\bgm.xml")

BGM XML

<?xml version="1.0" ?> 
<zAppointments reminder="15"> 
    <appointment> 
     <begin>1181251680</begin> 
     <uid>040000008200E000</uid> 
     <alarmTime>1181572063</alarmTime> 
     <state></state> 
     <location></location> 
     <duration>1800</duration> 
     <subject>Bring pizza home</subject> 
    </appointment> 
    <appointment> 
     <begin>1234360800</begin> 
     <duration>1800</duration> 
     <subject>Check MS Office website for updates</subject> 
     <location></location> 
     <uid>604f4792-eb89-478b-a14f-dd34d3cc6c21-1234360800</uid> 
     <state>dismissed</state> 
    </appointment> 
</zAppointments>

错误：

Traceback (most recent call last): 
    File "C:/Users/karthik/source/ChartAttributes/crecords", line 34, in <module> 
    parseXML("C:\\Users\\karthik\\Desktop\\xml_path\\bgm.xml") 
    File "C:/Users/karthik/source/ChartAttributes/crecords", line 26, in parseXML 
    for action, elem in context: 
    File "src\lxml\iterparse.pxi", line 208, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:150010) 
    File "src\lxml\iterparse.pxi", line 193, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:149708) 
    File "src\lxml\iterparse.pxi", line 221, in lxml.etree.iterparse._read_more_events (src\lxml\lxml.etree.c:150208) 
TypeError: reading file objects must return bytes objects

过程与退出代码完成1

来源

2017-10-11 karthik

任何你不直接执行'xml = etree.parse（xmlFile）'而不是将文件读入字符串然后用StringIO封装的原因？ –

我只是跟着这个博客https://www.blog.pythonlibrary.org/2010/11/20/python-parsing-xml-with-lxml/ – karthik

好吧...以及尝试直接使用'etree.parse'文件名并查看会发生什么 –

我认为你需要将XML作为一个字节数组，而不是一个字符串。

以二进制方式打开该文件，以获得一个bytes对象：

data=open(xmlFile, 'rb')

但它可能只是更容易的文件名传递给LXML，让它照顾打开和读取文件：

from lxml import etree 

def parseXML(xmlFile): 
    for action, elem in etree.iterparse(xmlFile): 
     text = elem.text or "None" 
     print(elem.tag + "=>" + text)

来源

2017-10-11 13:34:13

我添加了'b'获取errorTraceback（最近调用最后一次）：文件“C：/ Users/karthik/source/ChartAttributes/crecords”，第34行，在 parseXML（“C：\\ Users \\ karthik \ Desktop \\ xml_path \\ bgm.xml”） parseXML文件“C：/ Users/karthik/source/ChartAttributes/crecords”，第20行 data = open （XM lFile，'b'） ValueError：编辑rb后也必须有正确的创建/读取/写入/追加模式之一加上 – karthik

TypeError：initial_value必须是str或None，而不是字节 – karthik

解析lxml时出错

回答

相关问题