2017-10-11 75 views
0

使用lxml解析XML时,出现错误“读取文件对象必须返回字节对象”。下面的代码解析lxml时出错

from lxml import etree 
from io import StringIO 
def parseXML(xmlFile): 
    """ 
    parse the xml 
    """ 
    data=open(xmlFile) 
    xml=data.read() 
    data.close() 

    tree=etree.parse(StringIO(xml)) 
    context=etree.iterparse(StringIO(xml)) 
    for action, elem in context: 
     if not elem.text: 
      if not elem.text: 
       text="None" 
      else: 
       text=elem.text 
      print(elem.tag + "=>" + text) 
if __name__ == "__main__": 
    parseXML("C:\\Users\\karthik\Desktop\\xml_path\\bgm.xml") 

BGM XML

<?xml version="1.0" ?> 
<zAppointments reminder="15"> 
    <appointment> 
     <begin>1181251680</begin> 
     <uid>040000008200E000</uid> 
     <alarmTime>1181572063</alarmTime> 
     <state></state> 
     <location></location> 
     <duration>1800</duration> 
     <subject>Bring pizza home</subject> 
    </appointment> 
    <appointment> 
     <begin>1234360800</begin> 
     <duration>1800</duration> 
     <subject>Check MS Office website for updates</subject> 
     <location></location> 
     <uid>604f4792-eb89-478b-a14f-dd34d3cc6c21-1234360800</uid> 
     <state>dismissed</state> 
    </appointment> 
</zAppointments> 

错误:

Traceback (most recent call last): 
    File "C:/Users/karthik/source/ChartAttributes/crecords", line 34, in <module> 
    parseXML("C:\\Users\\karthik\\Desktop\\xml_path\\bgm.xml") 
    File "C:/Users/karthik/source/ChartAttributes/crecords", line 26, in parseXML 
    for action, elem in context: 
    File "src\lxml\iterparse.pxi", line 208, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:150010) 
    File "src\lxml\iterparse.pxi", line 193, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:149708) 
    File "src\lxml\iterparse.pxi", line 221, in lxml.etree.iterparse._read_more_events (src\lxml\lxml.etree.c:150208) 
TypeError: reading file objects must return bytes objects 

过程与退出代码完成1

+0

任何你不直接执行'xml = etree.parse(xmlFile)'而不是将文件读入字符串然后用StringIO封装的原因? –

+0

我只是跟着这个博客https://www.blog.pythonlibrary.org/2010/11/20/python-parsing-xml-with-lxml/ – karthik

+0

好吧...以及尝试直接使用'etree.parse'文件名并查看会发生什么 –

回答

0

我认为你需要将XML作为一个字节数组,而不是一个字符串。

以二进制方式打开该文件,以获得一个bytes对象:

data=open(xmlFile, 'rb') 

但它可能只是更容易的文件名传递给LXML,让它照顾打开和读取文件:

from lxml import etree 

def parseXML(xmlFile): 
    for action, elem in etree.iterparse(xmlFile): 
     text = elem.text or "None" 
     print(elem.tag + "=>" + text) 
+0

我添加了'b'获取errorTraceback(最近调用最后一次): 文件“C:/ Users/karthik/source/ChartAttributes/crecords”,第34行,在 parseXML(“C:\\ Users \\ karthik \ Desktop \\ xml_path \\ bgm.xml”) parseXML文件“C:/ Users/karthik/source/ChartAttributes/crecords”,第20行 data = open (XM lFile,'b') ValueError:编辑rb后也必须有正确的创建/读取/写入/追加模式之一加上 – karthik

+0

TypeError:initial_value必须是str或None,而不是字节 – karthik