2011-11-21 110 views
0

我想读简单的excel xml文件到字典。我试图使用xlrd 7.1,但它返回格式错误。现在我试图使用xml.etree.ElementTree,也没有成功。我无法更改.xml文件的结构。在这里我的代码:阅读Excel xml到字典

<?xml version="1.0" encoding="UTF-8"?> 
-<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:html="http://www.w3.org/TR/REC-html40"> 
    -<Styles> 
    -<Style ss:Name="Normal" ss:ID="Default"> 
     <Alignment ss:Vertical="Bottom"/> 
     <Borders/> 
     <Font ss:FontName="Verdana"/> 
     <Interior/> 
     <NumberFormat/> 
     <Protection/> 
    </Style> -<Style ss:ID="s22"> 
     <NumberFormat ss:Format="General Date"/> 
    </Style> 
    </Styles> -<Worksheet ss:Name="Linkfeed"> 
    -<Table> 
     -<Row> 
     -<Cell> 
      <Data ss:Type="String">ID</Data> 
     </Cell> -<Cell> 
      <Data ss:Type="String">URL</Data> 
     </Cell> 
     </Row> -<Row> 
     -<Cell> 
      <Data ss:Type="String">22222</Data> 
     </Cell> -<Cell> 
      <Data ss:Type="String">Hello there</Data> 
     </Cell> 
     </Row> 
    </Table> 
    </Worksheet> 
</Workbook> 

阅读:

import xml.etree.cElementTree as etree 

def xml_to_list(fname): 
     with open(fname) as xml_file: 
       tree = etree.parse(xml_file) 

       for items in tree.getiterator(tag="Table"): 
         for item in items: # Items is None! 
           print item.text 

更新,现在它的工作原理,但如何排除垃圾?

def xml_to_list(fname): 
     with open(fname) as xml_file: 
       tree = etree.iterparse(xml_file) 
       for item in tree: 
         print item[1].text 
+0

什么 “垃圾” 你在说什么? – Constantinius

+0

树中的空项目 – User

+0

对不起,我仍然无法找到你的问题。也许你可以澄清什么是错的。我无法找到任何语法错误,并且您使用'etree'似乎也是正确的。 – Constantinius

回答

1

排除 “垃圾” 与if语句:

def xml_to_list(fname): 
    with open(fname) as xml_file: 
      tree = etree.iterparse(xml_file) 
      for item in tree: 
       if item[1].text.strip() != '-': 
         print item[1].text 
+0

谢谢,做到了。如果我在分析之前清理原始xml会怎么样? – User

+0

我想添加额外的支票if item[1].text and item[1].text.strip() != '-':