2011-12-12 140 views
2

如何使用python打印/转储XML文档的“绝对路径”和值?Python xml绝对路径

例如:

<A> 
    <B>foo</B> 
    <C> 
    <D>On</D> 
    </C> 
    <E>Auto</E> 
    <F> 
    <G> 
     <H>shoo</H> 
     <I>Off</I> 
    </G> 
    </F> 
</A> 

/A/B, foo 
/A/C/D, On 
/A/E, Auto 
/A/F/G/H, shoo 
/A/F/G/I, Off 
+1

你想通过所有文本节点拖网和打印他们的祖先和价值? –

+0

是的,这可能是更好的说法:) – kristus

回答

2
from lxml import etree 
root = etree.XML(your_xml_string) 

def print_path_of_elems(elem, elem_path=""): 
    for child in elem: 
     if not child.getchildren() and child.text: 
      # leaf node with text => print 
      print "%s/%s, %s" % (elem_path, child.tag, child.text) 
     else: 
      # node with child elements => recurse 
      print_path_of_elems(child, "%s/%s" % (elem_path, child.tag)) 

print_path_of_elems(root, root.tag) 
0

像这样的东西应该为你工作:

from xml.etree.ElementTree import ElementTree 

tree = ElementTree() 
tree.parse(open('file.xml')) 
root = tree.getroot() 

def print_abs_path(root, path=None): 
    if path is None: 
     path = [root.tag] 

    for child in root: 
     text = child.text.strip() 
     new_path = path[:] 
     new_path.append(child.tag) 
     if text: 
      print '/{0}, {1}'.format('/'.join(new_path), text) 
     print_abs_path(child, new_path) 

print_abs_path(root) 
0

另一种方式来做到这一点会是这样的:

XMLDoc = etree.parse(open('file.xml')) 

for Node in XMLDoc.xpath('//*'): 
    if not Node.getchildren() and Node.text: 
     print XMLDoc.getpath(Node), Node.text 

根据文档的结构,您可能会在xpath中获取可能需要去除的节点编号。

0

完全低效的XPath的解决方案:

>>> from lxml import etree 
>>> tree = etree.fromstring(""" 
... <A> 
... <B>foo</B> 
... <C> 
...  <D>On</D> 
... </C> 
... <E>Auto</E> 
... <F> 
...  <G> 
...  <H>shoo</H> 
...  <I>Off</I> 
...  </G> 
... </F> 
... </A> 
... """) 
>>> for node in tree.xpath('//*[normalize-space(text())]'): 
...  print '/%s, %s' % (
...   '/'.join(a.tag for a in node.xpath('.//ancestor::*')), node.text) 
... 
/A/B, foo 
/A/C/D, On 
/A/E, Auto 
/A/F/G/H, shoo 
/A/F/G/I, Off