2016-07-13 31 views
0

我正在尝试使用Ghostscript重新保存PDF(以纠正PyPDF2无法处理的错误)。我打电话给Ghostscript subprocess.check_output,我想将原始PDF作为STDIN传递,并将新的PDF作为STDOUT导出。如何通过STDOUT从Python子流程命令导出二进制数据?

当我将PDF保存到文件并重新读入时,它工作正常。当我尝试从STDOUT传入文件时,它不起作用。我想也许这可能是一个编码问题,但我不想将任何内容编码为文本,我只想要二进制数据。也许有一些关于编码我不明白。

如何使STDOUT数据像文件数据一样工作?

import subprocess 
from PyPDF2 import PdfFileReader 
from io import BytesIO 
import traceback 

input_file_name = "SKMBT_42116071215160 (1).pdf" 
output_file_name = 'saved2.pdf' 
# input_file = open(input_file_name, "rb") # Moved below. 

# Write to a file, then read the file back in. This works. 
try: 
    ps1 = subprocess.check_output(
     ('gs', '-o', output_file_name, '-sDEVICE=pdfwrite', '-dPDFSETTINGS=/prepress', input_file_name), 
     # stdin=input_file # [edit] We pass in the file name, so this only confuses things. 
    ) 
    # I use BytesIO() in this example only to make the examples parallel. 
    # In the other example, I use BytesIO() because I can't pass a string to PdfFileReader(). 
    fakeFile1 = BytesIO() 
    fakeFile1.write(open(output_file_name, "rb").read()) 
    inputpdf = PdfFileReader(fakeFile1) 
    print inputpdf 
except: 
    traceback.print_exc() 

print "---------" 
# input_file.seek(0) # Added to address one comment. Removed while addressing another. 
input_file = open(input_file_name, "rb") 

# Export to STDOUT. This doesn't work. 
try: 
    ps2 = subprocess.check_output(
     ('gs', '-o', '-', '-sDEVICE=pdfwrite', '-dPDFSETTINGS=/prepress', '-'), 
     stdin=input_file, 
     # shell=True # Using shell produces the same error. 
    ) 
    fakeFile2 = BytesIO() 
    fakeFile2.write(ps2) 
    inputpdf = PdfFileReader(fakeFile2) 
    print inputpdf 
except: 
    traceback.print_exc() 

输出:

**** The file was produced by: 
    **** >>>> KONICA MINOLTA bizhub 421 <<<< 
<PyPDF2.pdf.PdfFileReader object at 0x101d1d550> 
--------- 
    **** The file was produced by: 
    **** >>>> KONICA MINOLTA bizhub 421 <<<< 
Traceback (most recent call last): 
    File "pdf_file_reader_test2.py", line 34, in <module> 
    inputpdf = PdfFileReader(fakeFile2) 
    File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1065, in __init__ 
    self.read(stream) 
    File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1774, in read 
    idnum, generation = self.readObjectHeader(stream) 
    File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1638, in readObjectHeader 
    return int(idnum), int(generation) 
ValueError: invalid literal for int() with base 10: "7-8138-11f1-0000-59be60c931e0'" 
+0

在windows上,需要将stdout配置为像这样的二进制文件:http://stackoverflow.com/questions/2374427/python-2-x-write-binary-output-to-stdout。不知道它有帮助。值得一试。 –

+0

值得一提,但我不认为这是这种情况下的解决方案。我使用的是OS X,我不知道可以更改的类似设置。 – Travis

+0

不确定,但这是正常的,你不倒带2调用之间的'input_file'? (工作和没有) –

回答

0

事实证明,这无关与Python。这是一个Ghostscript错误。正如本文中指出的:Prevent Ghostscript from writing errors to standard output,Ghostscript将错误写入标准输出,这会破坏管道输出的文件。

感谢@ Jean-FrançoisFabre,他建议我查看二进制文件。

+0

请将此答案标记为已接受,以便此问题不再出现为未解决问题。也许还可以重温这个问题?谢谢。 – tripleee

+0

当我这样做时,它说:“你明天可以接受你自己的答案” – Travis

相关问题