提取值

我知道从XML提取值以下格式：提取值

<note> 
    <col1>Tove</col1> 
    <col2>J</col2> 
    <test2> 
     <a> a </a> 
     <b> b </b> 
     <c> c </c> 
     <d> d </d> 
    </test2> 
    <code 
     a="1" 
     b="2" 
     c="3" 
    /> 
    <heading>Reminder</heading> 
    <body>Don't forget me this weekend!</body> 
</note>

我已经提取的值如下：

for a in xmls.getiterator(): 
    b = a.find("col1") # or col2 
    if b is not None: 
     print b.text #this helps in extracting the value 
     break

我的问题是我需要以提取test2和code节点的值，但使用上述方法，我得到的输出为None

预计输出

理想如下但得到直接的节点值一样a,b,c,d,1,2,3将是最好的

  <a> a </a> 
      <b> b </b> 
      <c> c </c> 
      <d> d </d> 

      and 

      a="1" 
      b="2" 
      c="3"

是什么，如果我们有目标节点名，以提取不同类型的XML值的值机方式？

相关：

来源

2015-12-30 NoobEditor

我会用lxml.etree，.xpath()和.attrib得到属性值：

import lxml.etree as ET 

data = """<note> 
    <col1>Tove</col1> 
    <col2>J</col2> 
    <test2> 
     <a> a </a> 
     <b> b </b> 
     <c> c </c> 
     <d> d </d> 
    </test2> 
    <code 
     a="1" 
     b="2" 
     c="3" 
    /> 
    <heading>Reminder</heading> 
    <body>Don't forget me this weekend!</body> 
</note> 
""" 

tree = ET.fromstring(data) 

for note in tree.xpath("//note"): 
    test2_values = [value.strip() for value in note.xpath(".//test2/*/text()")] 
    code_attrs = note.find("code").attrib 

    print(test2_values) 
    print(code_attrs)

在这里，我们基本上遍历所有note节点（假设有多个节点），获取内部节点test2下的所有节点的文本以及节点具有的所有属性。

打印：

['a', 'b', 'c', 'd'] 
{'b': '2', 'c': '3', 'a': '1'}

来源

2015-12-30 06:11:57 alecxe

很酷...有道理....从迭代时间POV，它是一个沉重的过程假定其解析大个XML？ – NoobEditor

@NoobEditor取决于它有多大的问题（不要过早优化，正如你可能记得的那样）。另外，如果需要，您可以迭代地解析XML，请参阅：http://stackoverflow.com/questions/9856163/using-lxml-and-iterparse-to-parse-a-big-1gb-xml-file和http： //stackoverflow.com/questions/324214/what-is-the-fastest-way-to-parse-large-xml-docs-in-python。 – alecxe

谢谢船长....一遍又一遍！ :) – NoobEditor

回答

相关问题