如何编辑PDF文件，替换其数据？

我想在PDF文件中旋转页面，然后用同一个pdf文件中的旋转页面替换旧页面。如何编辑PDF文件，替换其数据？

我写了下面的代码：

#!/usr/bin/python 

import os 
from pyPdf import PdfFileReader, PdfFileWriter 

my_path = "/home/USER/Desktop/files/" 

input_file_name = os.path.join(my_path, "myfile.pdf") 
input_file = PdfFileReader(file(input_file_name, "rb")) 
input_file.decrypt("MyPassword") 
output_PDF = PdfFileWriter() 

for num_page in range(0, input_file.getNumPages()): 
    page = input_file.getPage(num_page) 
    page.rotateClockwise(270) 
    output_PDF.addPage(page) 

#Trying to replace old data with new data in the original file, not 
#create a new file and add the new data! 
output_file_name = os.path.join(my_path, "myfile.pdf") 
output_file = file(output_file_name, "wb") 
output_PDF.write(output_file) 
output_file.close()

上面的代码给我一个错误！我已经甚至尝试使用：

input_file = PdfFileReader(file(input_file_name, "r+b"))

，但它没有工作，要么...

更改行：

output_file_name = os.path.join(my_path, "myfile.pdf")

有：

output_file_name = os.path.join(my_path, "myfile2.pdf")

修复一切，但它不是我想要的...

有什么帮助吗？

错误代码：

Traceback (most recent call last): File "12-5.py", line 22, in output_PDF.write(output_file) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 264, in write self._sweepIndirectReferences(externalReferenceMap, self._root) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 324, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 324, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 345, in _sweepIndirectReferences newobj = data.pdf.getObject(data) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 649, in getObject retval = readObject(self.stream, self) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 67, in readObject return DictionaryObject.readFromStream(stream, pdf) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 564, in readFromStream raise utils.PdfReadError, "Unable to find 'endstream' marker after stream." pyPdf.utils.PdfReadError: Unable to find 'endstream' marker after stream.

来源

2015-02-23 midkin

你是什么意思的“它没有工作”和“给出错误” – 2015-02-23 17:33:30

编辑错误代码！ – midkin 2015-02-23 17:36:43

的问题，我怀疑，是PyPDF从文件中读取，因为它是被写入。

正如您已经注意到的那样，正确的修复方法是写入单独的文件，然后用新文件替换原始文件。事情是这样的：

output_file_name = os.path.join(my_path, "myfile-temporary.pdf") 
output_file = file(output_file_name, "wb") 
output_PDF.write(output_file) 
output_file.close() 
os.rename(output_file_name, input_file_name)

我已经写了一些代码从而简化了这一点：https://github.com/shazow/unstdlib.py/blob/master/unstdlib/standard/contextlib_.py#L14

from unstdlib.standard.contextlib_ import open_atomic 

with open_atomic(input_file_name, "wb") as output_file: 
    output_PDF.write(output_file)

这将自动创建一个临时文件，写入它，然后替换原来的文件。

编辑：我最初错误地读了这个问题。以下是我的不正确，但对其他人的答案可能有帮助。

您的代码很好，并且应该在“大多数”PDF上无问题地工作。

您看到的问题是PyPDF与您尝试使用的特定PDF之间不兼容。这可能是PyPDF中的一个错误，也可能是PDF不完全有效。

有两件事情可以尝试：

看看PyPDF2可以读取该文件。用pip install PyPDF2安装PyPDF2，用import PyPDF2 …替换import pyPdf …，然后重新运行脚本。
使用其他程序重新编码您的PDF，看看是否有用。例如，使用像convert bad.pdf bad.ps; convert bad.ps maybe-good.pdf之类的东西可能会修复一些东西。

来源

2015-02-23 18:26:16

1.试过了！许多错误代码行。开始于：Traceback（最近一次调用最后一次）：文件“12-5.py”，第22行，在 output_PDF.write（output_file） 2.不知道该怎么做！ – midkin 2015-02-23 18:32:48

我的歉意 - 我误解了这个问题。看到我更新的答案。 – 2015-02-23 18:33:26

好吧，os.rename的工作原理！但是，我认为正确的答案是我所要做的不能这样做，因为PyPDF在写入文件时正在读取文件！ :) 但如果有人需要这样做，而不是创建并保存在他的硬盘驱动器中的新PDF文件，然后确保os.rename是做到这一点！因为这样就可以做我所需要的东西，即使不是我想的方式，我会选择这个作为正确的答案！ :) – midkin 2015-02-23 18:43:51

如何编辑PDF文件，替换其数据？

回答

相关问题