从pdf文件中提取文本

我试图在PDF文件中提取“[”和“]”之间的文本，但我无法这样做bcos文件似乎被加密。我得到了一些符号是不是可读的格式..从pdf文件中提取文本

public class ITextReadDemo { 

     public static void main(String[] args) { 
      try { 
       PdfReader reader = new PdfReader("D:\\temp\\1.pdf"); 
       System.out.println("This PDF has "+reader.getNumberOfPages()+" pages."); 
       String page = PdfTextExtractor.getTextFromPage(reader, 2); 
       System.out.println("Page Content:\n\n"+page+"\n\n"); 
       System.out.println("Is this document tampered : "+reader.isTampered()); 
       System.out.println("Is this document encrypted : "+reader.isEncrypted()); 

      } catch (IOException e) { 
       e.printStackTrace(); 
      } 
     } 
}

，但我得到这个异常：

Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/asn1/ASN1OctetString 
    at com.itextpdf.text.pdf.PdfEncryption.<init>(PdfEncryption.java:147) 
    at com.itextpdf.text.pdf.PdfReader.readDecryptedDocObj(PdfReader.java:775) 
    at com.itextpdf.text.pdf.PdfReader.readDocObj(PdfReader.java:1152) 
    at com.itextpdf.text.pdf.PdfReader.readPdf(PdfReader.java:512) 
    at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:172) 
    at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:161) 
    at pdfexc.ITextReadDemo.main(ITextReadDemo.java:19) 
Caused by: java.lang.ClassNotFoundException: org.bouncycastle.asn1.ASN1OctetString 
    at java.net.URLClassLoader.findClass(Unknown Source) 
    at java.lang.ClassLoader.loadClass(Unknown Source) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) 
    at java.lang.ClassLoader.loadClass(Unknown Source) 
    ... 7 more

我尝试以下方法也。它是从PDF文件中读取内容，但是当我展示它，它不是在可读格式

void readfile() { 
     Path path = Paths.get("D:\\temp\\1.pdf"); 
     Scanner scanner = new Scanner(path); 
     while(scanner.hasNextLine()){ 
      String line = scanner.nextLine(); 
       System.out.println(line); 
     } 
}

所有我需要的是从PDF文件（而不是文本文件）中的内容，因为它是可读格式我可以提取文本B/W [和]使用正则表达式..请帮助我，如果你知道解决方案。

来源

2016-02-23 Raj

你是否检查过你试图阅读的任何文件的r/w权限？ –

你的问题的原因已经由异常描述：

Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/asn1/ASN1OctetString

iText的使用BouncyCastle的图书馆像加密和签名的安全性相关的任务，你似乎没有该库在类路径或至少不是它所需的版本。

不幸的是，不要说你使用哪个iText版本，所以我不知道哪个BouncyCastle版本是必需的版本。

来源

2016-02-23 06:05:27 mkl

感谢您的建议......我会尽力 – Raj

从pdf文件中提取文本

回答

相关问题