2013-09-27 157 views
0

我尝试从此页面运行Pdfbox示例:http://www.printmyfolders.com/Home/PDFBox-Tutorial 从PDF文件中提取文本。当我尝试运行它时,出现错误:尝试运行pdfbox程序时出错

org.apache.pdfbox.exceptions.WrappedIOException 
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245) 
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192) 
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159) 
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130) 
    at GetPos.main(GetPos.java:14) 
Caused by: java.lang.ArrayIndexOutOfBoundsException 
    at java.lang.System.arraycopy(libgcj.so.10) 
    at java.io.ByteArrayOutputStream.write(libgcj.so.10) 
    at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:172) 
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98) 
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:295) 
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:237) 
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:172) 
    at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(PDFXrefStreamParser.java:61) 
    at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(PDFParser.java:848) 
    at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:576) 
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188) 
    ...4 more 

这是什么意思? 空白pdf的第一个例子很好。

回答

0

使用示例来生成文本PDF,然后阅读本教程的文本问题

package com.mycompany.mavenproject; 

import java.io.File; 
import junit.framework.Test; 
import junit.framework.TestCase; 
import junit.framework.TestSuite; 
import org.apache.pdfbox.pdmodel.PDDocument; 
import org.apache.pdfbox.pdmodel.PDPage; 
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream; 
import org.apache.pdfbox.pdmodel.font.PDFont; 
import org.apache.pdfbox.pdmodel.font.PDType1Font; 
import org.apache.pdfbox.util.PDFTextStripper; 

/** 
* Unit test for simple App. 
*/ 
public class AppTest 
    extends TestCase { 

public static Test suite() { 
    return new TestSuite(AppTest.class); 
} 

public void test() throws Exception { 
    final String fileName = "PDFWithText.pdf"; 
    writeDocument(fileName); 
    PDDocument pd = PDDocument.load(new File(fileName)); 
    PDFTextStripper stripper = new PDFTextStripper(); 
    String text = stripper.getText(pd); 
    assertEquals("Hello from www.printmyfolders.com", text.trim()); 
} 

private void writeDocument(String fileName) throws Exception { 
    PDDocument doc = new PDDocument(); 
    PDPage page = new PDPage(); 

    doc.addPage(page); 
    PDFont font = PDType1Font.HELVETICA_BOLD; 

    PDPageContentStream content = new PDPageContentStream(doc, page); 
    content.beginText(); 
    content.setFont(font, 12); 
    content.moveTextPositionByAmount(100, 700); 
    content.drawString("Hello from www.printmyfolders.com"); 

    content.endText(); 
    content.close(); 
    doc.save(fileName); 
    doc.close(); 
} 
} 

作品无一例外。考虑到来自加载方法的异常冒泡,请确保PDF格式不正常。

+0

很抱歉,但它不工作。我不是Java开发人员,也许我错过了什么?你能给我你的* .java文件的完整代码吗? – Footniko

+0

嗯..我在一个空的Maven模块(NetBeans)中进行单元测试。唯一缺少的代码是类定义和构造函数。修改原始帖子以包含完整的.java文件。 – Origineil

0

使用temp目录:

parser.setTempDirectory(new File(directoryPath)); 

例如:

File in = new File("somefile.pdf"); 
InputStream fin = new FileInputStream(in); 
PDFParser parser = new PDFParser(fin); 
parser.setTempDirectory(new File(tempDirectoryPath)); 
parser.parse(); 
PDDocument document = parser.getPDDocument();