通过

2016-09-17 34 views
1

JSoup无法正常提取元素我有一个网页以下元素:通过

<div id="pnNij" class="post" data-tag1="" data-tag2=""> 
    <a class="image-list-link" href="http://imgur.com/gallery/pnNij" data-page="0"> 
     <img alt="" src="./Imgur_ The most awesome images on the Internet_files/H7fZCNgb.jpg"> 


      <div class="point-info gradient-transparent-black transition"> 
       <div class="relative"> 
        <div class="pa-bottom"> 
         <div class="arrows"> 
          <div title="like" class="pointer arrow-up icon-upvote-outline" data="pnNij" type="image" data-up="4212"></div> 
          <div title="dislike" class="pointer arrow-down icon-downvote-outline" data="pnNij" type="image" data-downs="502"></div> 
          <div class="clear"></div> 
         </div> 

         <div class="point-info-points" title="points"> 
          <span class="points-pnNij">3,710</span> 
          <span class="points-text-pnNij">points</span> 
         </div> 
        </div> 
       </div> 
      </div> 

    </a> 
    <div class="hover"> 
        <p>Seems like 2017 has it all...</p> 


     <div class="post-info"> 
      album · 69,542 views 
     </div> 
    </div> 

</div> 

通知HREF如何等于http://imgur.com/gallery/pnNij

然而,当我使用JSoup取出从页面元素是这样的:

docImgur = Jsoup.connect("http://imgur.com/").get(); 
Elements links = docImgur.getElementsByClass("post"); 

该元件几乎正确提取,除了href属性是等于/画廊/ pnNij/

为什么href属性是否不包含完整的URL?

+0

元素链接对应在代码div的ID = “pnNij”。你错过了如何到达锚点并获得href属性。请添加这些代码段。 –

+0

我的答案是否解决了这个问题?如果是这样,请考虑接受它作为答案。 –

回答

0

当您检查网页的源文件,你会发现

<a class="image-list-link" href="/gallery/WRzti" data-page="0"> 
    ... 
</a> 

所以href属性也不是绝对的,这会导致你的预计业绩:/gallery/WRzti

解决方案

使用abs: attribute prefix

Document docImgur = Jsoup.connect("http://imgur.com/").get(); 

Elements links = docImgur.select("a[href].image-list-link"); 

for (Element element : links) { 
    System.out.println(element.attr("abs:href")); 
} 

输出

http://imgur.com/gallery/WRzti 
http://imgur.com/gallery/tCnDJ 
http://imgur.com/gallery/JIHYh 
...