2012-11-14 57 views
1

我一直试图解析XML文件数据现在几天,我无法让它工作。从下面的例子我需要每层的状态,索引和从前景/生产者类型和文件名。问题在于结构因内容而异。查看索引2,其中文件名位于前台/制作人/填充/生产者(我不需要在前台/生产者/键/生产者下的文件)。我正在寻找一个简单的解决方案(一直在尝试使用etree.ElementTree,但解析似乎非常困难)。python elementtree

<?xml version="1.0" encoding="utf-8"?> 
<channel> 
    <video-mode>1080i5000</video-mode> 
    <stage> 
     <layers> 
     <layer> 
      <status>stopped</status> 
      <auto_delta>-1</auto_delta> 
      <frame-number>1829997</frame-number> 
      <nb_frames>0</nb_frames> 
      <frames-left>-1829996</frames-left> 
      <foreground> 
       <producer> 
        <type>empty-producer</type> 
       </producer> 
      </foreground> 
      <background> 
       <producer> 
        <type>transition-producer</type> 
        <source> 
        <producer> 
         <type>empty-producer</type> 
        </producer> 
        </source> 
        <destination> 
        <producer> 
         <type>ffmpeg-producer</type> 
         <filename>media\\MULTI\testfile2.mpg</filename> 
         <width>1920</width> 
         <height>1080</height> 
         <progressive>true</progressive> 
         <fps>25</fps> 
         <loop>false</loop> 
         <frame-number>0</frame-number> 
         <nb-frames>4396</nb-frames> 
         <file-frame-number>0</file-frame-number> 
         <file-nb-frames>4396</file-nb-frames> 
        </producer> 
        </destination> 
       </producer> 
      </background> 
      <index>0</index> 
     </layer> 
     <layer> 
      <status>playing</status> 
      <auto_delta>-1</auto_delta> 
      <frame-number>1830920</frame-number> 
      <nb_frames>4294967295</nb_frames> 
      <frames-left>4293136376</frames-left> 
      <foreground> 
       <producer> 
        <type>ffmpeg-producer</type> 
        <filename>media\AMB.mp4</filename> 
        <width>720</width> 
        <height>576</height> 
        <progressive>true</progressive> 
        <fps>25</fps> 
        <loop>true</loop> 
        <frame-number>1830920</frame-number> 
        <nb-frames>4294967295</nb-frames> 
        <file-frame-number>520</file-frame-number> 
        <file-nb-frames>1600</file-nb-frames> 
       </producer> 
      </foreground> 
      <background> 
       <producer> 
        <type>empty-producer</type> 
       </producer> 
      </background> 
      <index>1</index> 
     </layer> 
     <layer> 
      <status>playing</status> 
      <auto_delta>-1</auto_delta> 
      <frame-number>1830758</frame-number> 
      <nb_frames>4294967295</nb_frames> 
      <frames-left>4293136538</frames-left> 
      <foreground> 
       <producer> 
        <type>separated-producer</type> 
        <fill> 
        <producer> 
         <type>ffmpeg-producer</type> 
         <filename>media\action.mpg</filename> 
         <width>1920</width> 
         <height>1080</height> 
         <progressive>false</progressive> 
         <fps>25</fps> 
         <loop>true</loop> 
         <frame-number>1830758</frame-number> 
         <nb-frames>4294967295</nb-frames> 
         <file-frame-number>22</file-frame-number> 
         <file-nb-frames>247</file-nb-frames> 
        </producer> 
        </fill> 
        <key> 
        <producer> 
         <type>ffmpeg-producer</type> 
         <filename>media\action_a.mpg</filename> 
         <width>1920</width> 
         <height>1080</height> 
         <progressive>false</progressive> 
         <fps>25</fps> 
         <loop>true</loop> 
         <frame-number>1830758</frame-number> 
         <nb-frames>4294967295</nb-frames> 
         <file-frame-number>22</file-frame-number> 
         <file-nb-frames>247</file-nb-frames> 
        </producer> 
        </key> 
       </producer> 
      </foreground> 
      <background> 
       <producer> 
        <type>empty-producer</type> 
       </producer> 
      </background> 
      <index>2</index> 
     </layer> 
     </layers> 
    </stage> 
    <mixer/> 
    <output> 
     <consumers> 
     <consumer> 
      <type>oal-consumer</type> 
      <index>500</index> 
     </consumer> 
     <consumer> 
      <type>ogl-consumer</type> 
      <key-only>false</key-only> 
      <windowed>true</windowed> 
      <auto-deinterlace>true</auto-deinterlace> 
      <index>600</index> 
     </consumer> 
     </consumers> 
    </output> 
    <index>0</index> 
</channel> 

回答

0
import xml.etree.ElementTree as ET            
tree = ET.parse('x.xml')               
root = tree.getroot()               

for child in root:                
    print child.tag                
    for child2 in child:               
     print '> ',child2.tag             

'''                    
====                    
output                   
====                    
video-mode                  
stage                   
> layers                  
mixer                   
output                   
> consumers                  
index                   

''' 

至于这个问题:“谓结构取决于内容是不同的。”每个XML都是根据DTD定义来定义的。文件的结构不能在内部进行更改,否则将会不明确。如果你的意思是,要分析这取决于节点上方叶子的树的部分,你将不得不拿出一些,如果再else语句和功能,例如如这样:

import xml.etree.ElementTree as ET            
tree = ET.parse('x.xml')               
root = tree.getroot()               

def parseStageTag(element):              
    print 'parsing Stage'              
    for child in element:              
     if child.tag=='layers':             
      parseLayersTag(child)            

def parseOutputTag(element):              
    pass                   

def parseLayersTag(element):              
    print 'parsing Layers'              
    for child in element:              
     print child                


for child in root:                
    if child.tag=='stage':              
     parseStageTag(child)              

    for child2 in child:               
     print '> ',child2.tag 
''' 
output 
parsing Stage 
parsing Layers 
<Element 'layer' at 0x1079e4250> 
<Element 'layer' at 0x1079e4f10> 
<Element 'layer' at 0x1079e6510> 
> layers 
> consumers 
''' 
+0

谢谢!我太忙测试find和findall并完全错过了这种方法... – user1823221

+0

好。如果是解决方案,您可以接受答案。 – RParadox

0

我“已经解析XML文件中发现了类似的问题,直到我发现ElementTree's support for XPath

例如,下面的代码:

import os 
import xml.etree.ElementTree 

os.chdir('C:/temp/blah') 

et = xml.etree.ElementTree.parse('file.xml') 
layerTagList = et.findall("./stage/layers/layer") 

for curLayerTag in layerTagList: 
    indexTag = curLayerTag.find("./index") 
    print "Layer[%s]" %(indexTag.text) 
    fgFiles = curLayerTag.findall(".//foreground//filename") 
    for fileTag in fgFiles: 
      print " FG - %s" %(fileTag.text) 
    bgFiles = curLayerTag.findall(".//background//filename") 
    for fileTag in bgFiles: 
      print " BG - %s" %(fileTag.text) 

给出了输出:

Layer[0] 
    BG - media\\MULTI\testfile2.mpg 
Layer[1] 
    FG - media\AMB.mp4 
Layer[2] 
    FG - media\action.mpg 
    FG - media\action_a.mpg