2014-04-02 59 views
0

我越来越喜欢这个enter image description here
您好我正在使用Selenium Webdriver抓取一个网页,我能够实现我的数据,但问题是这与浏览器直接交互,我不想打开Web浏览器,并且想要刮的所有数据,因为它是如何使用htmlunitsriver进行网页抓取?

我如何能实现我的目标

这里是我的代码提前

import org.openqa.selenium.By; 
    import org.openqa.selenium.WebDriver; 
    import org.openqa.selenium.WebElement; 
    import org.openqa.selenium.firefox.FirefoxDriver; 
    import org.openqa.selenium.support.ui.Select; 

    public class GetData { 

     public static void main(String args[]) throws InterruptedException { 
      String sDate = "27/03/2014"; 
      WebDriver driver = new FirefoxDriver(); 
      String url="http://www.upmandiparishad.in/commodityWiseAll.aspx"; 
      driver.get(url); 
      Thread.sleep(5000); 
      // select barge 
      new Select(driver.findElement(By.id("ctl00_ContentPlaceHolder1_ddl_commodity"))).selectByVisibleText("Jo"); 
      driver.findElement(By.id("ctl00_ContentPlaceHolder1_txt_rate")).sendKeys(sDate); 
      // click buttonctl00_ContentPlaceHolder1_txt_rate 
      Thread.sleep(3000); 
      driver.findElement(By.id("ctl00_ContentPlaceHolder1_btn_show")).click(); 
      Thread.sleep(5000); 

      //get only table tex 
      WebElement findElement = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1")); 
      String htmlTableText = findElement.getText(); 
      // do whatever you want now, This is raw table values. 
     System.out.println(htmlTableText); 


      driver.close(); 
      driver.quit(); 

     } 
    } 


My updated New code 



import com.gargoylesoftware.htmlunit.BrowserVersion; 
import org.openqa.selenium.By; 
import org.openqa.selenium.WebDriver; 
import org.openqa.selenium.WebElement; 
import org.openqa.selenium.firefox.FirefoxDriver; 
import org.openqa.selenium.htmlunit.HtmlUnitDriver; 
import org.openqa.selenium.support.ui.Select; 

    public class Getdata1 { 

     public static void main(String args[]) throws InterruptedException { 
      WebDriver driver = new HtmlUnitDriver(BrowserVersion.FIREFOX_3_6); 
     driver.get("http://www.upmandiparishad.in/commodityWiseAll.aspx"); 
     System.out.println(driver.getPageSource()); 
     Thread.sleep(5000); 
     // select barge   
     new Select(driver.findElement(By.id("ctl00_ContentPlaceHolder1_ddl_commodity"))).selectByVisibleText("Jo"); 

     String sDate = "12/04/2014"; //What date you want 
     driver.findElement(By.id("ctl00_ContentPlaceHolder1_txt_rate")).sendKeys(sDate); 

     driver.findElement(By.id("ctl00_ContentPlaceHolder1_btn_show")).click(); 
     Thread.sleep(3000); 

     //get only table tex 
     WebElement findElement = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1")); 
     String htmlTableText = findElement.getText(); 
     // do whatever you want now, This is raw table values. 
     System.out.println(htmlTableText); 

     driver.close(); 
     driver.quit(); 

     } 
    } 

感谢

+0

更改Firefox的版本到最新的one.Or改变了镀铬的..应该我是这个问题。 – Nadun

回答

1

可以使用HtmlUnit或HtmlUnitDriver硒

WebDriver driver = new HtmlUnitDriver(BrowserVersion.FIREFOX_17); 
    driver.get("http://www.upmandiparishad.in/commodityWiseAll.aspx"); 
    System.out.println(driver.getPageSource()); 
    Thread.sleep(5000); 
    // select barge   
    new Select(driver.findElement(By.id("ctl00_ContentPlaceHolder1_ddl_commodity"))).selectByVisibleText("Jo"); 

    String sDate = "12/04/2014"; //What date you want 
    driver.findElement(By.id("ctl00_ContentPlaceHolder1_txt_rate")).sendKeys(sDate); 

    driver.findElement(By.id("ctl00_ContentPlaceHolder1_btn_show")).click(); 
    Thread.sleep(3000); 

    //get only table tex 
    WebElement findElement = driver.findElement(By.id("ctl00_ContentPlaceHolder1_GridView1")); 
    String htmlTableText = findElement.getText(); 
    // do whatever you want now, This is raw table values. 
    System.out.println(htmlTableText); 

    driver.close(); 
    driver.quit(); 

要获得表格输出,你可以尝试这样的事情..

String arrCells[] = htmlTableText.split(" "); 
    Boolean bIsANumber = false; 
    for(int i = 0; i < arrCells.length; i++) { 

     try { 
      int tmp = Integer.parseInt(arrCells[i]); 
      bIsANumber = true; 
     } 
     catch(Exception ex) { 
      bIsANumber = false; 
     } 

     if(bIsANumber) { 
      System.out.print("\n"+arrCells[i]+"\t"); 
     } 
     else { 
      System.out.print(arrCells[i]+"\t"); 
     } 
    } 
+0

我怎么能在Chrome – user3456343

+0

'webdriver的驱动程序=新HtmlUnitDriver(BrowserVersion.CHROME)改变;'文件:http://code.google.com/p/selenium/wiki/HtmlUnitDriver – Nadun

+0

我将有铬任何额外的jar文件? – user3456343