2012-10-12 87 views
0

从网址解析下面的xml期间,我有一个问题。在我的URL路径使用minidom从URL解析XML与Python

示例XML:

<?xml version="1.0" encoding="utf-8"?> 
<Documents> 
    <class> 
     <mid name="yyyyyyyyyyyyy"></mid> 
     <person name="yyyyyyyyyy"></person> 
     <url name="yyyyyyyyy"></url> 
    </class> 
    <class> 
     <mid name="xxxxx"></mid> 
     <person name="xxxxxxxxxx"></person> 
     <url name="xxxxxxxxxxx"></url> 
    </class> 
</Documents> 

下面是我的Python代码;

def staff_list(request): 

    url = http://path.to.url/ 
    dom = minidom.parse(urlopen(url)) 
    person = dom.getElementsByTagName('person') 
    for i in person: 
     print i.attributes['name'].value 

in forloop我想在xml中打印属于同一父类的person和url标记值。

我试过以下法迭代,但得到的“值过多解压” ERROR

def staff_list(request): 

    url = http://path.to.url/ 
    dom = minidom.parse(urlopen(url)) 
    person = dom.getElementsByTagName('person') 
    mid = dom.getElementsByTagName('mid') 
    url = dom.getElementsByTagName('url') 
    for i,j,k in person,mid,url: 
     print i.attributes['name'].value,j.attributes['name'].value,k.attributes['name'].value 

有什么建议?

回答

2

你想用zip()的元素结合起来,我认为:

for i,j,k in zip(person, mid, url): 

虽然帮自己一个大忙,使用ElementTree API代替;该API远比Python DOM API更加复杂且更易于使用。

+0

Thanks.Works般的魅力 – tunaktunak

1

如果你想与minidom坚持您可以将循环更改为:

for cls in dom.getElementsByTagName('class'): 
    person = cls.getElementsByTagName('person')[0] 
    mid = cls.getElementsByTagName('mid')[0] 
    url = cls.getElementsByTagName('url')[0] 

    print person.attributes['name'].value 
    print mid.attributes['name'].value 
    print url.attributes['name'].value 

正如@Martijn皮特斯说,看看ElementTree的作为替代API。例如:

import xml.etree.ElementTree as ET 
documents = ET.fromstring(xmlstr) 
for cls in documents.iter('class'): 
    person = cls.find('person') 
    mid = cls.find('mid') 
    url = cls.find('url') 

    print person.get('name'), mid.get('name'), url.get('name') 
0

我会用XPath和lxml.html: 简约的方法:

import lxml.html as lh 
doc=lh.parse(test.xml) 

In [70]: persons = doc.xpath('.//person/@name') 

In [71]: urls=doc.xpath('.//person[@name]/following-sibling::url/@name') 

In [72]: mids=doc.xpath('.//person[@name]/preceding-sibling::mid/@name') 

In [73]: [[p,m,u]for p,m,u in zip(persons, mids, urls)] 
Out[73]: 
[['yyyyyyyyyy', 'yyyyyyyyyyyyy', 'yyyyyyyyy'], 
['xxxxxxxxxx', 'xxxxx', 'xxxxxxxxxxx']]