HTMLDocument迭代器跳过标记

我正在使用HTMLDocument迭代器来尝试遍历HTMLDocument中的所有标记。但是，迭代器似乎跳过嵌套在p标签中的标签。例如：HTMLDocument迭代器跳过标记

<html> 
    <body> 
    <a href = "somesite"> some site </a> 
     <p> 
      <a href = "someothersite"> some other site </a> 
     </p> 
    </body> 
</html>

迭代器都将获得第一个标签（somesite），但它不会去到一个标签p标签（someothersite）内。

下面的代码：

private void getLinks() throws MalformedURLException { 
    HTMLDocument.Iterator it = content.getIterator(HTML.Tag.A);   
    it.next(); 
     while(it.isValid()) { 
      // Do something 
      it.next(); 
     } 
}

任何人都可以说明为什么？

来源

2012-10-11 Kumalh

啊 - 事实证明，它是第一个it.next（）之前进入循环.. – Kumalh

也许isValid()检查会打破你的循环。尝试迭代器是否在没有检查的情况下击中第二个锚标签。

来源

2012-10-11 07:52:13

这就是问题 - 它打破了循环，但它不应该是因为有更多的标签留在文档中.. – Kumalh

HTMLDocument迭代器跳过标记

回答

相关问题