2012-06-11 33 views
0

我在学习Python,并且对XML解析器(ElementTree-XMLParser)行为有一些困难的理解。Python XMLParser:什么时候是data()方法调用

我修改的例子在documentation

class MaxDepth:      # The target object of the parser 
    path = "" 
    def start(self, tag, attrib): # Called for each opening tag. 
     self.path += "/"+ tag 
     print '>>> Entering - ' + self.path 
    def end(self, tag):    # Called for each closing tag. 
     print '<<< Leaving - ' + self.path 
     if self.path.endswith('/'+tag): 
      self.path = self.path[:-(len(tag)+1)] 
    def data(self, data): 
     if data: 
      print '... data called ...' 
      print data , 'length -' , len(data) 
    def close(self): # Called when all data has been parsed. 
     return self 

它输出下面输出

>>> Entering - /a 
... data called ... 

length - 1 
... data called ... 
    length - 2 
>>> Entering - /a/b 
... data called ... 

length - 1 
... data called ... 
    length - 2 
<<< Leaving - /a/b 
... data called ... 

length - 1 
... data called ... 
    length - 2 
>>> Entering - /a/b 
... data called ... 

length - 1 
... data called ... 
    length - 4 
>>> Entering - /a/b/c 
... data called ... 

length - 1 
... data called ... 
     length - 6 
>>> Entering - /a/b/c/d 
... data called ... 

length - 1 
... data called ... 
     length - 6 
<<< Leaving - /a/b/c/d 
... data called ... 

length - 1 
... data called ... 
    length - 4 
<<< Leaving - /a/b/c 
... data called ... 

length - 1 
... data called ... 
    length - 2 
<<< Leaving - /a/b 
... data called ... 

length - 1 
<<< Leaving - /a 
<__main__.MaxDepth instance at 0x10e7dd5a8> 

我的问题是

  1. 当是()方法调用的数据。
  2. 为什么在开始标记之前调用两次
  3. 我无法找到api文档以获取有关data方法的更多详细信息。我在哪里可以找到类似XMLParser类的api参考javadoc。
+1

如果您的使用不需要事件解析,则使用'.parse()'http://www.doughellmann.com/PyMOTW/xml/etree/ElementTree/parse.html更容易。否则,他的事件示例可能会有所帮助:http://www.doughellmann.com/PyMOTW/xml/etree/ElementTree/parse.html#watching-events-while-parsing – ninMonkey

回答

2

如果你要修改数据的方法,像这样:

def data(self, data): 
    if data: 
     print '... data called ...' 
     print repr(data), 'length -' , len(data) 

,你就会明白为什么有对数据的方法多次调用;它被称为为标签之间的文本每一行数据:

>>> Entering - /a 
... data called ... 
'\n' length - 1 
... data called ... 
' ' length - 2 
>>> Entering - /a/b 
... data called ... 
'\n' length - 1 
... data called ... 
' ' length - 2 
<<< Leaving - /a/b 
... data called ... 
'\n' length - 1 
... data called ... 
' ' length - 2 
>>> Entering - /a/b 
... data called ... 
'\n' length - 1 
... data called ... 
' ' length - 4 
# ... etc ... 

的XMLParser的方法是基于Expat解析器。

根据我的经验,任何流式XML解析器都会将文本数据视为一系列块,并且必须将任何和所有数据事件连接在一起,直到您触及下一个starttag或endtag事件。解析器经常在空白边界处分块,但这不是给定的。

相关问题