2017-10-06 32 views
1

我在学习Python。我有与此类似数据的大XML文件:Python XML解析器:文档元素后的垃圾

testData3.xml文件

<r><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c></c><c></c><c>something1</c><c>something1</c></r> 
<r><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c></c><c></c><c>something2</c><c>something2</c></r> 

我抄的XML解析器的是,在收集数据的工作我的Python的一本书当数据文件只包含一行时。只要我添加第二行数据,脚本在运行时就会失败。我在寻找如何编写循环的一些帮助,使我的xmlReader.py继续通过整个文件,而不是

from xml.dom.minidom import parse, Node 
    xmltree = parse('testData3.xml') 
    for node1 in xmltree.getElementsByTagName('c'): 
     for node2 in node1.childNodes: 
      if node2.nodeType == Node.TEXT_NODE: 
       print(node2.data) 

:我快

Python脚本(xmlReader.py)只有一条线。我收到以下错误,当我运行此脚本:在执行过程中

错误:

[email protected]:~/xxxx/xxxx> python xmlReader.py 
Traceback (most recent call last): 
    File "xmlReader.py", line 2, in <module> 
    xmltree = parse('testData3.xml') 
    File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/minidom.py", line 1915, in parse 
    return expatbuilder.parse(file) 
    File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 926, in parse 
    result = builder.parseFile(fp) 
    File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 207, in parseFile 
    parser.Parse(buffer, 0) 
xml.parsers.expat.ExpatError: junk after document element: line 2, column 0 
[email protected]:~/xxxx/xxxx> 

回答

1

的问题是,您的示例数据不是有效的XML。一个有效的XML文档应该有一个根元素;这对于文件的单行是正确的,其中<r>是根元素,但添加第二行时不是这样,因为每行都包含在单独的<r>元素中,但文件中没有全局父元素。

任一构建有效的XML,例如:

<root> 
    <r><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c></c><c></c><c>something1</c><c>something1</c></r> 
    <r><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c></c><c></c><c>something2</c><c>something2</c></r> 
</root> 

或由线分析该文件行:

from xml.dom.minidom import parseString 
f = open('testData3.xml'): 
    for line in f: 
     xmltree = parseString(line) 
     ... 
f.close() 
相关问题