2012-09-10 34 views
1

我试图从XML文档中拉出一个转义节点。该节点的原始文本看起来是这样的:从python的ElementTree unescaping xml文本

<Notes>{&quot;Phase&quot;: 0, &quot;Flipper&quot;: 0, &quot;Guide&quot;: 0,  
&quot;Sample&quot;: 0, &quot;Triangle8&quot;: 0, &quot;Triangle5&quot;: 0,  
&quot;Triangle4&quot;: 0, &quot;Triangle7&quot;: 0, &quot;Triangle6&quot;: 0,  
&quot;Triangle1&quot;: 0, &quot;Triangle3&quot;: 0, &quot;Triangle2&quot;: 0}</Notes> 

我拉文本列如下:

infile = ET.parse("C:/userfiles/EXP011/SESAME_60/SESAME_60_runinfo.xml") 
r = infile.getroot() 
XMLNS = "{http://example.com/foo/bar/runinfo_v4_3}" 
x=r.find(".//"+XMLNS+"Notes") 
print(x.text) 

我有望获得:

{"Phase": 0, "Flipper": 0, "Guide&quot": 0,  
"Sample": 0, "Triangle8": 0, "Triangle5": 0,  
"Triangle4": 0, "Triangle7": 0, "Triangle6": 0,  
"Triangle1": 0, "Triangle3": 0, "Triangle2": 0} 

但是,相反,我得到了:

{&quot;Phase&quot;: 0, &quot;Flipper&quot;: 0, &quot;Guide&quot;: 0,  
&quot;Sample&quot;: 0, &quot;Triangle8&quot;: 0, &quot;Triangle5&quot;: 0, 
&quot;Triangle4&quot;: 0, &quot;Triangle7&quot;: 0, &quot;Triangle6&quot;: 0, 
&quot;Triangle1&quot;: 0, &quot;Triangle3&quot;: 0, &quot;Triangle2&quot;: 0} 

我该如何得到你nescaped字符串?

+1

ElementTree的不UNESCAPE''"因为你通常不*需要*逃避'“'在XML我的回答是错误的,同样的原因 –

回答

1

使用HTMLParser.HTMLParser()

In [8]: import HTMLParser  

In [11]: HTMLParser.HTMLParser().unescape('&quot;') 
Out[11]: u'"' 

saxutils处理&lt;&gt;&amp;,但它不处理&quot;

In [9]: import xml.sax.saxutils as saxutils 

In [10]: saxutils.unescape('&quot;') 
Out[10]: '&quot;'  
+1

完全正确;'”'不*需要*用XML引用,所以saxutils模块不会处理(就像ElementTree一样)。 –

+0

谢谢。那样做了。有一天,我将不得不与服务器开发人员进行交流,并了解为什么服务器首先要摆脱引号。 – user640078