2017-08-16 57 views
-1

我想使用“page.asText()”解析蒸汽市场的网页,但这不起作用。这可能发生是因为在1秒内加载html之后,项目未被加载。WebClient(htmlunit)没有看到一些元素

public static void main(String[] args) throws Exception{ 
      java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF); 
      java.util.logging.Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.OFF); 
      String link="http://steamcommunity.com/market/search?appid=730#p6_price_asc"; 
      HtmlPage page; 
      WebClient webClient = new WebClient(BrowserVersion.CHROME); 
      page = (HtmlPage) webClient.getPage(link); 
      System.out.println(page.asText()); 
      } 

在控制台中我看到:

Show advanced options... 






< 1 2 3 4 5 6 ... 939 > 
Showing 1-10 of 9389 results 

它需要:

所有的
Show advanced options... 
PRICE 
QUANTITY 
NAME 
31,218 
Starting at: 
$0.35 USD 
Operation Hydra Case 
Counter-Strike: Global Offensive 
276,582 
Starting at: 
$0.23 USD 
. 
. 
. 

M4A1-S | Decimator (Field-Tested) 
Counter-Strike: Global Offensive 


232 
Starting at: 
$27.06 USD 

AWP | Asiimov (Battle-Scarred) 
Counter-Strike: Global Offensive 


28,068 
Starting at: 
$0.75 USD 

Krakow 2017 Legends Autograph Capsule 
Counter-Strike: Global Offensive 


< 1 2 3 4 5 6 ... 940 > 
Showing 1-10 of 9392 results 

回答

0

首先,确保启用javascript。

webClient.getOptions.setJavaScriptEnabled(true); 

我通常做的,以等待更多的元素,以负荷为:

thread.sleep(3000); 

这使第3页秒加载的所有附加内容。

您也可以尝试任何由这里的其他用户列出的其他方法:

HTMLUnit doesn't wait for Javascript

+1

时需要使用 “的Thread.Sleep(3000);”? WebClient webClient =新的WebClient(BrowserVersion.CHROME); webClient.getOptions()。setJavaScriptEnabled(true); page =(HtmlPage)webClient.getPage(link); System.out.println(page.asText()); –

+0

您将需要在webClient.getPage(链接)之后使用thread.sleep()。 –

+0

WOW。“getPage(link)”总是重新加载?我想一次所有的getPage。谢谢你。这么多) –