2015-08-20 24 views
0

我必须通过jsoup解析一个页面。该页面有一个类和各种元素的标签,如p,h1,h2, h3等我想分析它们,然后处理它们中的每一个。该页面的样子:如何遍历jsoup中的各种元素?

<div class="pf-content"> 
     <p>For centuries, Spain shone and progressed under Muslim rule. Unfortunately, the city of Seville fell prey to the barbaric onslaught of the Kingdom of Castile in the year 1248. Several innocent Spaniards were killed, many were forced to leave their homeland and seek refuge elsewhere, whereas many others were captured and taken as slaves. The rulers of Castile further destroyed remnants of Islamic life and culture, <a href="https://muslimmemo.com/masjids-spain/">including masjids</a>.</p> 
     <h3>Original Arabic Text</h3> 
     <h4>Original Arabic Text</h4> 
    </div> 

其中p,H3,H4等出现确实重要,因为我必须把它解析到Android TextView的序列。

什么我可以做的是:

Document document = Jsoup.connect("page link here").get(); 

Elements pTag = document.select("div.pf-content"); 

但是我应该如何从这里出发?请帮帮我。

我想的是:

Elements elements = document.select("div.pf-content"); 

      for (Element element : elements) { 
       Log.d("FullContent", "elements are: " + element); 
       if (element.select("p").first() != null) { 
        Log.d("FullContent", "a p tag"); 
        if (element.select("p").first().select("img").first() != null) { 
         Log.d("FullContent", "the tag " + "has src"); 
        } 


       } else if (element.select("h1").first() != null) { 
        Log.d("FullContent", "a h1 tag"); 
       } else if (element.select("h2").first() != null) { 
        Log.d("FullContent", "a h2 tag"); 
       } else if (element.select("h3").first() != null) { 
        Log.d("FullContent", "a h3 tag"); 
       } else if (element.select("h4").first() != null) { 
        Log.d("FullContent", "a h4 tag"); 
       } else { 
        Log.d("FullContent", "other tag"); 
       } 

      } 

回答

1

一旦你有你发现Elements pTag = document.select("div.pf-content");Elements,你可以做到以下几点:

Elements elements = pTag.first().children(); for (Element e : elements){ // Do something with each element }

+0

有编辑一起来看看。请告诉我以上不起作用。任何相同的教程。 http://jsoup.org/apidocs/org/jsoup/select/Elements.html工作不正常。 – learner

+0

尝试'元素元素= document.getElementsByClass(“pf-content”);'虽然这会给你一个所述类的元素列表。你必须得到这些元素之一(例如拳头),并调用'children()'来获取div标签中的元素。 – GSala

+0

谢谢你的帮助。你能给我一个jsoup教程的链接吗? – learner