将pdf的部分内容渲染为图像

-1

是否有任何工具将PDF文档渲染为具有部分内容的图像？例如，只有文本，但没有图像和矢量，或者只有图像和矢量，但没有文本。将pdf的部分内容渲染为图像

2014-10-06 Yu Liang

它是否需要成为ghostscript还是你还准备做一点Java编程？ – mkl 2014-10-06 07:02:19

欢迎任何建议。 – 2014-10-06 07:08:46

Apache Java库PDFBox包含用于渲染PDF页面的代码（与当前的1.8.x版本相比，它在当前的2.0.0开发快照中得到了很大改进）。这段代码基本上调用了'PageDrawer'类。你可以相当简单地调整该类，只绘制你选择的东西。 – mkl 2014-10-06 07:25:57

执行此操作的“传统”方法是预处理PDF文件，以便只保留所需的元素，然后栅格化剩余的文件。

举例来说，我已经实现了PDF到iPad工作流程，其中callas pdfToolbox（注意，我连接到这家公司）用于在文本文件中分割PDF文件和“除文本“文件。之后，“除文本外的任何内容”文件都被栅格化，并且重新组合了两个文件。

因此，无论您想要使用什么工具，我都会看到该工具如何预处理文件以删除无用的元素，或者如何拆分出您想要的文件。然后使用该工具的正常光栅化功能。

来源

2014-10-06 07:23:52

随着Debenu Quick PDF Library你能做的提取方法有两种：

1.PDF2Image只是文本，没有图像

DPL.LoadFromFile("my_file.pdf", ""); 
int image_count = DPL.FindImages(); //number of embedded images 
for(int i=0; i<=image_count; i++) 
{ 
    DPL.ClearImage(i); //clear the images 
} 
DPL.RenderageToFile(72, 1, 0, "just_text.bmp"); //save the file to image, without the images

下面是功能列表： http://www.debenu.com/docs/pdf_library_reference/ImageHandling.php

2 .PDF2Image只是文字，没有图像

DPL.LoadFromFile("my_file.pdf", ""); 
DPL.GetPageText(3); //this returns CSV string with the cordinates of the text 

//create new blank file 
//XPos is the horizontal position of the text - get it from the CSV string 
//YPos is the vertical position of the text - get it from the CSV string 
//your_text is the text to draw - get it from the CSV string 
DPL.DrawText(XPos, YPos, your_text); 
DPL.RenderageToFile(72, 1, 0, "just_text.bmp"); //save the file to image, without the images

来源

2014-10-27 15:40:23 zacharpali

将pdf的部分内容渲染为图像

回答

相关问题