使用Jsoup获取网页元素

我正在尝试使用Jsoup从名为morningstar的网站获取股票数据。我看过其他论坛，一直没有找到问题所在。使用Jsoup获取网页元素

我试图做更先进的数据报废，但我似乎无法得到的价格。我要么返回null或根本没有。

我知道其他语言和API，但我想使用Jsoup，因为它似乎很有能力。

这是我到目前为止有：

public class Scrape { 
    public static void main(String[] args){ 
     String URL = "http://www.morningstar.com/stocks/xnas/aapl/quote.html"; 
     Document d = new Document(URL); 
     try{ 
      d = Jsoup.connect(URL).get(); 
     }catch(IOException e){ 
      e.printStackTrace(); 
     } 
     Element stuff = d.select("#idPrice gr_text_bigprice").first(); 
     System.out.println("Price of AAPL: " + stuff); 
     } 
}

任何帮助，将不胜感激。

来源

2016-06-07 BillytheKid

你肯定不是由JavaScript动态生成的数据？ –

由于是动态创建的内容使用JavaScript，你可以使用模拟浏览器一样的HtmlUnit https://sourceforge.net/projects/htmlunit/

关于价格等嵌入一个iFrame的信息，所以我们首先抓住（也动态地构建）之后，iFrame链接并解析iFrame。

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); 

final WebClient webClient = new WebClient(BrowserVersion.CHROME); 
webClient.getOptions().setCssEnabled(false); 
webClient.getOptions().setJavaScriptEnabled(true); 
webClient.getOptions().setThrowExceptionOnScriptError(false); 
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); 
webClient.getOptions().setTimeout(1000); 

HtmlPage page = webClient.getPage("http://www.morningstar.com/stocks/xnas/aapl/quote.html"); 

Document doc = Jsoup.parse(page.asXml()); 

String title = doc.select(".r_title").select("h1").text(); 

String iFramePath = "http:" + doc.select("#quote_quicktake").select("iframe").attr("src"); 

page = webClient.getPage(iFramePath); 

doc = Jsoup.parse(page.asXml()); 

System.out.println(title + " | Last Price [$]: " + doc.select("#last-price-value").text());

打印：

Apple Inc | Last Price [$]: 98.63

中的HtmlUnit JavaScript引擎是相当缓慢的（上面的代码把我的机器上大约18秒）的，所以它可能是寻找到其他的JavaScript引擎/无头的浏览器有用（phantomJs等;检查此选项列表：https://github.com/dhamaniasad/HeadlessBrowsers）以提高性能，但HtmlUnit完成工作。您也可以尝试用自定义WebConnectionWrapper过滤不相关的脚本，图片等：

http://htmlunit.10904.n7.nabble.com/load-parse-speedup-tp22735p22738.html

来源

2016-06-07 10:17:41

使用Jsoup获取网页元素

回答

相关问题