2013-10-08 68 views
1

我有这段代码,我想用相同的标签将动物分组到前一组。标签<dog><dog><dogs><dog/><dog/></dogs>等。但在我的代码中,我不知道为什么是没有动物的输出。在python 3中用lxml解析xml 3

OUTPUT:

<root> 
     <zoo> 
      <some_tag/><some_diff/> 
     </zoo> 
     <zoo> 
      <b/><o/> 
     </zoo> 
</root> 

CODE:

xml = '`<root> 
        <zoo> 
         <some_tag/><some_diff/> 
         <dog/><dog/> 
         <cat/><cat/><cat/> 
        </zoo> 
        <zoo> 
         <b/><o/> 
         <dog/><dog/> 
         <cat/><cat/><cat/><cat/> 
        </zoo> 
      </root>`' 

from lxml import etree as et 
root = et.fromstring(xml) 
node = root.findall('./zoo') 
j = False 
k = False 
for zoo in node: 
    for animal in zoo: 
     if 'dog' in animal.tag: 
      if not j: 
       dogs = et.SubElement(zoo,'dogs') 
      dogs.append(animal) 
      j = True 
     if 'cat' in animal.tag:       
      if not k: 
       cats = et.SubElement(zoo,'cats')    
      cats.append(animal) 
      k = True 

    k = False 
    j= False 
+0

请重新说明您的问题..这是不是很清楚 – securecurve

+0

我改变了它,好吗? – dusan

+0

雅..好得多:)) – securecurve

回答

1

我做了一些修改脚本,它为我工作..检查出来:

xml = '''<root> 
        <zoo> 
         <some_tag/> 
         <some_diff/> 
         <dog/> 
         <dog/> 
         <cat/> 
         <cat/> 
         <cat/> 
        </zoo> 

        <zoo> 
         <b/> 
         <o/> 
         <dog/> 
         <dog/> 
         <cat></cat> 
         <cat></cat> 
        </zoo> 
      </root>''' 

from lxml import etree as et 


root = et.fromstring(xml) 

# The below 3 lines have the same effect, use whichever you like 
node = root.findall('./zoo') 
node = list(root.getchildren()) 
node = root.getchildren() 


dogs_flag = False 
cats_flag = False 

for zoo in node: 

    # Resetting the flags in each iteration, otherwise, you will 
    # have all the cats and dogs inside one zoo element ... try it yourself 
    dogs_flag = False 
    cats_flag = False 

    for animal in zoo: 

     if 'dog' == animal.tag: 
      if not dogs_flag: 
       dogs = et.SubElement(zoo,'dogs') 
       dogs_flag = True # I think this is a better place to set your flag     

      dogs.append(animal) 


     if 'cat' == animal.tag:       
      if not cats_flag: 
       cats = et.SubElement(zoo,'cats')    
       cats_flag = True 

      cats.append(animal) 


print et.tostring(root, pretty_print = True) 

这会给你这个输出

 <root> 
       <zoo> 
        <some_tag/> 
        <some_diff/> 
        <dogs> 
         <dog/> 
         <dog/> 
        </dogs> 
        <cats> 
         <cat/> 
         <cat/> 
         <cat/> 
        </cats> 
       </zoo> 

       <zoo> 
        <b/> 
        <o/> 
        <dogs> 
         <dog/> 
         <dog/> 
        </dogs> 
        <cats> 
         <cat/> 
         <cat/> 
        </cats> 
       </zoo> 
     </root> 
+0

是的,那就是。你的代码看起来比我的好得多。但我的问题是比较。我的代码:animal.tag和你的'dog':'dog'== animal.tag,你知道我为什么比较不好吗? – dusan

+1

@dusan,通常'in'用于在字符串/列表/元组/集合中查找某个对象(python中的所有对象),但不用于对象之间的比较。使用'in'做一些实验将帮助您了解差异...祝你好运。 – securecurve

+0

如果你喜欢回答投票:) – securecurve