使用JSoup从网站获取文本

我正在使用JSoup来解析html网站。我想从（例如）维基百科获得文章。我希望从“今日精选文章”表中获取主页（http://en.wikipedia.org/wiki/Main_Page）中的文字。使用JSoup从网站获取文本

下面的代码：

Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page”); 
Elements el = doc.select("div.mp-tfa”); 
System.out.println(el);

的问题是，它不能正常工作 - 它打印出只是一个空行。 “从今天的专题文章”表插入在div class =“mp-tfa”中。

如何在我的java程序中获取此文本？

在此先感谢。

来源

2014-02-09 Ganjira

变化：

doc.select("div.mp-tfa");

要：

doc.select("div#mp-tfa");

更好的方法将遍历从而获取了tag，class或您选择的Element的Elements，简单地说：

Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page").get(); 
Elements el = doc.select("div#mp-tfa"); 
for (Element e : el) { 
    System.out.println(e.text()); 
}

会给：

The Boulonnais is a heavy draft horse breed from Fr....

来源

2014-02-09 07:59:20 PopoFibo

非常感谢！它帮助！ ;） – Ganjira

@托莱多很高兴帮助:) – PopoFibo

我认为它应该是：

Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page").get(); 
Elements el = doc.select("div#mp-tfa"); 
System.out.println(el);

来源

2014-02-09 07:59:54 theconsultingthief

非常感谢！它帮助！ ;） – Ganjira

很高兴提供帮助，虽然PopoFibo的答案更全面。 :) – theconsultingthief

使用JSoup从网站获取文本

回答

相关问题