2016-08-05 58 views
0

我一直在Python文档中寻找从XML文件中获取标签名称的方法,但我一直没有取得成功。使用下面的XML文件,可以获取国家名称标签及其所有关联的子标签。有谁知道这是如何完成的?如何使用python获取XML中的所有标签?

<?xml version="1.0"?> 
<data> 
    <country name="Liechtenstein"> 
     <rank>1</rank> 
     <year>2008</year> 
     <gdppc>141100</gdppc> 
     <neighbor name="Austria" direction="E"/> 
     <neighbor name="Switzerland" direction="W"/> 
    </country> 
    <country name="Singapore"> 
     <rank>4</rank> 
     <year>2011</year> 
     <gdppc>59900</gdppc> 
     <neighbor name="Malaysia" direction="N"/> 
    </country> 
    <country name="Panama"> 
     <rank>68</rank> 
     <year>2011</year> 
     <gdppc>13600</gdppc> 
     <neighbor name="Costa Rica" direction="W"/> 
     <neighbor name="Colombia" direction="E"/> 
    </country> 
</data> 
+0

查找到BeautifulSoup4库。 – Keozon

回答

1

考虑使用元素树的iterparse()并构建标签和文本对的嵌套列表。有条件if逻辑用于组国家项目一起离开了元素没有文本,然后replace()用来清理出换行和多白色空间,iterparse()涵盖:

import xml.etree.ElementTree as et 

data = [] 
for (ev, el) in et.iterparse(path): 
    inner = [] 

    if el.tag == 'country':   
     for name, value in el.items(): 
      inner.append([el.tag+'-'+name, str(value).replace('\n','').replace(' ','')]) 
     for i in el: 
      if str(i.text) != 'None': 
       inner.append([i.tag, str(i.text).replace('\n','').replace(' ','')]) 

      for name, value in i.items(): 
       inner.append([i.tag+'-'+name, str(value).replace('\n','').replace(' ','')]) 
     data.append(inner) 

print(data) 
# [[['country-name', 'Liechtenstein'], ['rank', '1'], ['year', '2008'], ['gdppc', '141100'], 
# ['neighbor-name', 'Austria'], ['neighbor-direction', 'E'], 
# ['neighbor-name', 'Switzerland'], ['neighbor-direction', 'W']] 
# [['country-name', 'Singapore'], ['rank', '4'], ['year', '2011'], ['gdppc', '59900'], 
# ['neighbor-name', 'Malaysia'], ['neighbor-direction', 'N']] 
# [['country-name', 'Panama'], ['rank', '68'], ['year', '2011'], ['gdppc', '13600'], 
# ['neighbor-name', 'CostaRica'], ['neighbor-direction', 'W'], 
# ['neighbor-name', 'Colombia'], ['neighbor-direction', 'E']]] 
-1

查看Python的内置XML功能,递归遍历文档并收集集合中的所有标记。

相关问题