Selenium Webdriver - 帮助PDF到文本转换

-1

以下代码完美地下载PDF。现在我想将这个PDF内容转换为文本文件。请帮助。我通过护目镜尝试了很多代码，但都没有工作。Selenium Webdriver - 帮助PDF到文本转换

import org.openqa.selenium.By; 
import org.openqa.selenium.WebDriver; 
import org.openqa.selenium.firefox.FirefoxDriver; 
import org.openqa.selenium.firefox.FirefoxProfile; 
import org.testng.annotations.AfterTest; 
import org.testng.annotations.BeforeTest; 
import org.testng.annotations.Test; 

@Test 

public class PDF_Download_without_popup { 
WebDriver driver; 

@BeforeTest 
public void StartBrowser() { 

    //Create object of FirefoxProfile in built class to access Its properties. 

    FirefoxProfile fprofile = new FirefoxProfile(); 

    //Set Location to store files after downloading. 

    fprofile.setPreference("browser.download.dir", "c:\\WebDriverdownloads"); 

    fprofile.setPreference("browser.download.folderList", 2); 

//Set Preference to not show file download confirmation dialogue using MIME types Of different file extension types. 

    fprofile.setPreference("browser.helperApps.neverAsk.saveToDisk", 
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;"//MIME types Of MS Excel File. 
    + "application/pdf;" //MIME types Of PDF File. 
    + "application/vnd.openxmlformats-officedocument.wordprocessingml.document;" //MIME types Of MS doc File. 
    + "text/plain;" //MIME types Of text File. 
    + "text/csv"); //MIME types Of CSV File. 
    fprofile.setPreference("browser.download.manager.showWhenStarting", false); 

    fprofile.setPreference("pdfjs.disabled", true); 

    //Pass fprofile parameter In webdriver to use preferences to download file. 

    driver = new FirefoxDriver(fprofile); 

} 

    public void OpenURL() throws InterruptedException{ 

    driver.get("http://www.bell.ca/"); 
    driver.manage().window().maximize(); 
    Thread.sleep(30000); 
    driver.findElement(By.xpath(".//*[@id='demoLoginLinkJs']/span[1]")).click(); 
    driver.findElement(By.xpath(".//*[@id='USER']")).sendKeys("bell_56789"); 
    driver.findElement(By.xpath(".//*[@id='PASSWORD']")).sendKeys("sunday21"); 
    driver.findElement(By.xpath(".//*[@id='demoLoginJs']")).click(); 
    driver.findElement(By.xpath("//span[contains(text(),'View current bill')]")).click(); 

    Thread.sleep(5000); 


    driver.findElement(By.xpath(".//*[@id='btnDownloadBill']")).click(); 
    String tmp= driver.getCurrentUrl().toString(); 
    System.out.println(tmp); 
    Thread.sleep(50000); 


} 

@AfterTest 
public void CloseBrowser() { 
    driver.quit(); 
} 
}

来源

2015-05-28 Geetanjali C

你得到了什么错误？ –

请将Selenium webdriver –

下载的PDF格式的PDF转换为文本的代码或链接，请不要将PDF代码处理成PDF ..请帮助 –

尝试使用Apache PDFBox API。

然后将其添加到您的项目。

在您的情况下，您正在下载PDF，但不要下载它，请在navigate.to()的URL中打开浏览器中的PDF，例如：http://www.bell.ca/xyz.pdf。所以，你的代码将是这样的：

URL xyzUrl = new URL("http://www.bell.ca/xyz.pdf"); 

BufferedInputStream TestFile = new BufferedInputStream(xyzUrl.openStream()); 
PDDocument xyzPDF = PDDocument.loadNonSeq(TestFile, null); 
String testText = new PDFTextStripper().getText(xyzPDF); 
xyzPDF.close();

现在你已经从PDF文件中的所有文本，并可以将这些文本写入使用第三方API，如Apache POI或外部XLS或任何相关的文件类型任何其他可用的API。

来源

2015-05-28 12:09:47 Raavan

嗨Pritam，我曾尝试使用您共享的代码。这里的问题是，在我的应用程序URL是PDF下载源页面：https://mybell.bell.ca/Mobility/Billing/CurrentMobilityBill?AcctNo=506540566207DA7D21817764E2831250CF419BC141B457EAE761A07E862087A1B6D26EF365972B6F –

该pdf作为本地Windows应用程序/通过浏览器插件和硒无法处理如果是这样的话，下载后，PDF文件将不再在浏览器下，并且硒不能自动执行它，因为pdf将在下载后成为本地窗口应用程序:(可能有助于 http://stackoverflow.com/questions/6668141/interacting-with-a-pdf-popup-in-selenium – Raavan

我编辑你的答案（这是OK），以便它使用当前的PDFBox API。 –

@Geetanjali，我可以建议另一种方式。有几个在线网站提供PDF到文本转换服务。那你只需要上传你的文件并点击“转换”，然后你的pdf将转换为文本。

所以，我的观点是你也可以自动化它也每次你下载 pdf。下载PDF后，打开其中一个网站。使用第三方工具（例如AutoIT API）上传您的文件（添加到您的构建路径中）。并且可以在转换后下载文本文件。

来源

2015-05-28 13:00:31 Raavan

Selenium Webdriver - 帮助PDF到文本转换

回答

相关问题