2
我正在检查文件系统中的损坏的PDF。在我正在运行的测试中,有近200k PDF。看起来好像更小的损坏的文件警报正确,但我碰到一个大的15 MB文件损坏,代码只能无限期地挂起。我试过将Strict设置为False而没有运气。这似乎是最初的问题。而不是做线程和设置超时(我曾尝试在过去很少成功),我希望有一个替代方案。PyPDF2 - 无法过去。一个大的损坏的文件
import PyPDF2, os
from time import gmtime,strftime
path = raw_input("Enter folder path of PDF files:")
t = open(r'c:\pdf_check\log.txt','w')
count = 1
for dirpath,dnames,fnames in os.walk(path):
for file in fnames:
print count
count = count + 1
if file.endswith(".pdf"):
file = os.path.join(dirpath, file)
try:
PyPDF2.PdfFileReader(file,'rb',warndest="c:\test\warning.txt")
except PyPDF2.utils.PdfReadError:
curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "fail" + "\n")
else:
pass
#curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
#t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "pass" + "\n")
t.close()