2015-02-23 18 views
0

我想在PDF文件中旋转页面,然后用同一个pdf文件中的旋转页面替换旧页面。如何编辑PDF文件,替换其数据?

我写了下面的代码:

#!/usr/bin/python 

import os 
from pyPdf import PdfFileReader, PdfFileWriter 

my_path = "/home/USER/Desktop/files/" 

input_file_name = os.path.join(my_path, "myfile.pdf") 
input_file = PdfFileReader(file(input_file_name, "rb")) 
input_file.decrypt("MyPassword") 
output_PDF = PdfFileWriter() 

for num_page in range(0, input_file.getNumPages()): 
    page = input_file.getPage(num_page) 
    page.rotateClockwise(270) 
    output_PDF.addPage(page) 

#Trying to replace old data with new data in the original file, not 
#create a new file and add the new data! 
output_file_name = os.path.join(my_path, "myfile.pdf") 
output_file = file(output_file_name, "wb") 
output_PDF.write(output_file) 
output_file.close() 

上面的代码给我一个错误!我已经甚至尝试使用:

input_file = PdfFileReader(file(input_file_name, "r+b")) 

,但它没有工作,要么...

更改行:

output_file_name = os.path.join(my_path, "myfile.pdf") 

有:

output_file_name = os.path.join(my_path, "myfile2.pdf") 

修复一切,但它不是我想要的...

有什么帮助吗?

错误代码:

Traceback (most recent call last): File "12-5.py", line 22, in output_PDF.write(output_file) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 264, in write self._sweepIndirectReferences(externalReferenceMap, self._root) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 324, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 324, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 345, in _sweepIndirectReferences newobj = data.pdf.getObject(data) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 649, in getObject retval = readObject(self.stream, self) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 67, in readObject return DictionaryObject.readFromStream(stream, pdf) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 564, in readFromStream raise utils.PdfReadError, "Unable to find 'endstream' marker after stream." pyPdf.utils.PdfReadError: Unable to find 'endstream' marker after stream.

+0

你是什么意思的“它没有工作”和“给出错误” – 2015-02-23 17:33:30

+0

编辑错误代码! – midkin 2015-02-23 17:36:43

回答

1

的问题,我怀疑,是PyPDF从文件中读取,因为它是被写入。

正如您已经注意到的那样,正确的修复方法是写入单独的文件,然后用新文件替换原始文件。事情是这样的:

output_file_name = os.path.join(my_path, "myfile-temporary.pdf") 
output_file = file(output_file_name, "wb") 
output_PDF.write(output_file) 
output_file.close() 
os.rename(output_file_name, input_file_name) 

我已经写了一些代码从而简化了这一点:https://github.com/shazow/unstdlib.py/blob/master/unstdlib/standard/contextlib_.py#L14

from unstdlib.standard.contextlib_ import open_atomic 

with open_atomic(input_file_name, "wb") as output_file: 
    output_PDF.write(output_file) 

这将自动创建一个临时文件,写入它,然后替换原来的文件。

编辑:我最初错误地读了这个问题。以下是我的不正确,但对其他人的答案可能有帮助。

您的代码很好,并且应该在“大多数”PDF上无问题地工作。

您看到的问题是PyPDF与您尝试使用的特定PDF之间不兼容。这可能是PyPDF中的一个错误,也可能是PDF不完全有效。

有两件事情可以尝试:

  1. 看看PyPDF2可以读取该文件。用pip install PyPDF2安装PyPDF2,用import PyPDF2 …替换import pyPdf …,然后重新运行脚本。

  2. 使用其他程序重新编码您的PDF,看看是否有用。例如,使用像convert bad.pdf bad.ps; convert bad.ps maybe-good.pdf之类的东西可能会修复一些东西

+0

1.试过了!许多错误代码行。开始于:Traceback(最近一次调用最后一次): 文件“12-5.py”,第22行,在 output_PDF.write(output_file) 2.不知道该怎么做! – midkin 2015-02-23 18:32:48

+0

我的歉意 - 我误解了这个问题。看到我更新的答案。 – 2015-02-23 18:33:26

+0

好吧,os.rename的工作原理! 但是,我认为正确的答案是我所要做的不能这样做,因为PyPDF在写入文件时正在读取文件! :) 但如果有人需要这样做,而不是创建并保存在他的硬盘驱动器中的新PDF文件,然后确保os.rename是做到这一点! 因为这样就可以做我所需要的东西,即使不是我想的方式,我会选择这个作为正确的答案! :) – midkin 2015-02-23 18:43:51