2012-12-04 228 views
9

我需要解析XML文件与一些CDATA块的,我需要保留供以后绘制的xml:解析CDATA与蟒蛇

<process id="process1"> <log name="name1" device="device1"><![CDATA[timestamp value]]]></log> <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]]></log> </process>

我需要反复并迅速做到这一点,我正在寻找最好的方法来做到这一点。我读过ElementTree是更快的方法,但我接受其他建议。

+0

xtree是您的问题比元素树更好的替代方案。 – Rajendra

回答

10

这里是如何做到这一点的两个例子:

from lxml import etree 
import xml.etree.ElementTree as ElementTree 

CONTENT = """ 
<process id="process1"> 
<log name="name1" device="device1"><![CDATA[timestamp value]]></log> 
<log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]></log> 
</process> 
""" 

def parse_with_lxml(): 
    root = etree.fromstring(CONTENT) 
    for log in root.xpath("//log"): 
     print log.text 

def parse_with_stdlib(): 
    root = ElementTree.fromstring(CONTENT) 
    for log in root.iter('log'): 
     print log.text 

if __name__ == '__main__': 
    parse_with_lxml() 
    parse_with_stdlib() 

输出:

timestamp value 
timestamp value, timestamp value, timestamp 
timestamp value 
timestamp value, timestamp value, timestamp 

text属性它处理它在这两种情况下。

+1

为了表演,可以使用'cElementTree'(注:leadind'c') – jfs