2016-06-14 48 views
0

我尝试使用requests库下载多个pdf,并使用pypdf将它们合并在一起。一般来说,这工作正常,但对于一些PDF,我只是得到一个错误。Unicode错误PyPdf

MWE.py

import requests 
from pyPdf import PdfFileWriter, PdfFileReader 
from StringIO import StringIO 


input = PdfFileReader(StringIO(response.content)) 
input.decrypt("") 
output = PdfFileWriter() 
output.addPage(input.getPage(0)) 

outputStream = file("document-output.pdf", "wb") 
output.write(outputStream) 
outputStream.close() 

session.close() 

错误

Traceback (most recent call last): 
    File "mwe.py", line 21, in <module> 
    input.decrypt("") 
    File "/usr/local/lib/python2.7/dist-packages/pyPdf/pdf.py", line 894, in decrypt 
    return self._decrypt(password) 
    File "/usr/local/lib/python2.7/dist-packages/pyPdf/pdf.py", line 904, in _decrypt 
    user_password, key = self._authenticateUserPassword(password) 
    File "/usr/local/lib/python2.7/dist-packages/pyPdf/pdf.py", line 945, in _authenticateUserPassword 
    encrypt.get("/EncryptMetadata", BooleanObject(False)).getObject()) 
    File "/usr/local/lib/python2.7/dist-packages/pyPdf/pdf.py", line 1818, in _alg35 
    key = _alg32(password, rev, keylen, owner_entry, p_entry, id1_entry) 
    File "/usr/local/lib/python2.7/dist-packages/pyPdf/pdf.py", line 1729, in _alg32 
    m.update(id1_entry) 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) 

对于跟踪我从文件中读取输入,但我不认为它在这种情况下很重要。

我发现这个问题有一些相关的问题,但我无法解决我的具体问题。

+0

你打算分享追踪的其余部分吗? –

+0

解密方法中发生错误不是吗?其实pdf没有加密,但我发现这个解决方法与空密码。否则,它会在addPage方法内出现'Exception:file has not decrypted'错误。 –

+0

你为什么使用'file'?你应该真的使用'打开' –

回答