可以用PDFBox复制一个pdf与iText一样小吗？

我正在阅读PDF格式并输出PDF格式的原始PDF的多个副本。我通过对PDFBox和iText进行相同的测试。如果我单独复制每个页面，iText会创建更小的输出。可以用PDFBox复制一个pdf与iText一样小吗？

问题：有没有另一种方法可以在PDFBox中做到这一点，导致较小的输出PDF文件。

对于一个示例的输入文件，生成两个副本到与两个工具的输出：

原始PDF大小：30K
PDFBox的（V 1.7.1）生成的PDF：84K
iText的（ v 5.3.4）生成PDF：35K

PDFBox的Java代码（对不起，造成您的错误处理）。请注意它是如何通过读取输入，并通过和重复它作为一个整体：

PDFMergerUtility merger = new PDFMergerUtility(); 
PDDocument workplace = null; 
try { 
    for (int cnt = 0; cnt < COPIES; ++cnt) { 
     PDDocument document = null; 
     InputStream stream = null; 
     try { 
      stream = new FileInputStream(new File(sourceFileName)); 
      document = PDDocument.load(stream); 
      if (workplace == null) { 
       workplace = document; 
      } else { 
       merger.appendDocument(workplace, document); 
      } 
     } finally { 
      if (document != null && document != workplace) { 
       document.close(); 
      } 
      if (stream != null) { 
       stream.close(); 
      } 
     } 
    } 

    OutputStream out = null; 
    try { 
     out = new FileOutputStream(new File(destinationFileName)); 
     workplace.save(out); 
    } finally { 
     if (out != null) { 
      out.close(); 
     } 
    } 
} catch (COSVisitorException e1) { 
    e1.printStackTrace(); 
} catch (IOException e) { 
    e.printStackTrace(); 
} finally { 
    if (workplace != null) { 
     try { 
      workplace.close(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } 
    } 
}

代码与iText的做到这一点。注意如何通过页面和转账每个页面加载输入文件页面的输出：

Document document = null; 
PdfReader reader = null; 
InputStream inputStream = null; 
FileOutputStream outputStream = null; 
try { 
    inputStream = new FileInputStream(new File(sourceFileName)); 
    outputStream = new FileOutputStream(new File(destinationFileName)); 
    document = new Document(); 
    PdfCopy copy = new PdfSmartCopy(document, outputStream); 
    document.open(); 
    reader = new PdfReader(inputStream); 
    // loop over the pages in that document 
    int pdfPageNo = reader.getNumberOfPages(); 
    for (int page = 0; page < pdfPageNo;) { 
     PdfImportedPage onePage = copy.getImportedPage(reader, ++page); 
     // duplicate each page N times 
     for (int i = 0; i < COPIES; ++i) { 
      copy.addPage(onePage); 
     } 
    } 
    copy.freeReader(reader); 
} catch (DocumentException e) { 
    e.printStackTrace(); 
} catch (IOException e) { 
    e.printStackTrace(); 
} finally { 
    if (reader != null) { 
     reader.close(); 
    } 
    if (document != null) { 
     document.close(); 
    } 
    try { 
     if (inputStream != null) { 
      inputStream.close(); 
     } 
     if (outputStream != null) { 
      outputStream.close(); 
     } 
    } catch (IOException e) { 
     // do nothing 
    } 
}

两者都是由该所包围：

public class Duplicate { 

    /** The original PDF file */ 
    private static final String sourceFileName = "PDF_CI_US2CA.pdf"; 

    /** The resulting PDF file. */ 
    private static final String destinationFileName = "itext_output.pdf"; 
    private static final int COPIES = 2; 

    public static void main(String[] args) { 
      ... 
     } 
}

来源

2013-01-18 Lee Meador

恕我直言，这是一个比纯粹的技术问题更多的关于经济学的问题。您有iText的工作解决方案，但您想使用PdfBox。这个选择是有代价的。因为ASL，我认为你更喜欢PdfBox。然而，由于没有人支付PdfBox，因此您不应该期望该库速度更快，功能更丰富，更完整......我在2009年将iText许可证从MPL/AGPL更改为AGPL，因为我需要开始创收以确保图书馆的进一步发展。如果没有这个收入，iText会死得很慢。 –

@BrunoLowagie我明白你在说什么，但是因为我不是这两个图书馆的专家，所以我找到了一个工作解决方案。也许还有另一种使用PDFBox的解决方案，它将创建更小的PDF文件。也许不是。对于我的需求，iText在这方面可能会更好。我只是想从这两种工具的专家那里获得一些帮助。这提出了一个问题，因为你是iText的专家，至于我是否有在iText中创建重复页面的最佳解决方案？ –

PdfSmartCopy在内存中保存某些对象的散列，例如流（在早期版本的iText中）和字体词典（在最新版本中）。每当一个对象被重用时，我们添加一个对原始对象的引用而不是重复它（当使用PdfCopy而不是PdfSmartCopy时）。 Acrobat甚至可以做得更好：Acrobat可以将同一字体的不同子集合并成一个（更大）的子集。我们不支持（还），因为它涉及重写整个内容流（不是微不足道的+需要更多的CPU /内存）。 –

使用下面的解决方案，我能够创建一个PDF文件具有许多重复的页面，并且对存储的影响最小。

PDDocument samplePdf = null; 
try { 
    samplePdf = PDDocument.load(PDF_PATH); 
    PDPage page = (PDPage) samplePdf.getDocumentCatalog().getAllPages().get(0); 

    for(int i = 0; i < COPIES; i++) { 
     samplePdf.importPage(page); 
    } 

    samplePdf.save(SAVE_PATH); //$NON-NLS-1$ 

} catch (IOException e) { 
    e.printStackTrace(); 
} catch (COSVisitorException e) { 
    e.printStackTrace(); 
}

在我第一次尝试我用过，samplePdf.addPage(page)但预计它没有工作。所以很明显，add和import函数之间是有区别的。我将不得不查看源代码或文档以查看原因。无论如何，这应该可以帮助您用PDFBox为您的需求设计解决方案。

来源

2013-09-27 20:48:00

你的方法产生的文件要小得多。例如，对于67K的原始1页文件，我将COPIES设置为1（对于1个原件加上一个副本）运行您的代码，并获得78K的文件。用我的方法，2中的文件是133K。 –

请注意，要复制具有多个页面的文档，很容易将'.get（0）'更改为该“List”中所有页面上的循环。 –

我还需要注意，在导入之前克隆要添加的页面是最安全的。我遇到了[严重问题在这里详述]（https://issues.apache.org/jira/browse/PDFBOX-1586）。我从PDFMergeUtility.java获取了克隆代码。 'PDPage newPage = new PDPage（（COSDictionary）cloner.cloneForNewDocument（page.getCOSDictionary（）））;'这仍然节省磁盘空间，可能会使用更多的内存，但它避免了上述链接中的复杂问题。 –

可以用PDFBox复制一个pdf与iText一样小吗？

回答

相关问题