2016-12-28 65 views
0

我使用python 3.5.2和pytesseract,有一个错误TypeError: a bytes-like object is required, not 'str'当我运行我的代码,(详情如下):类型错误:一类字节对象是必需的,而不是“海峡”在Python 3.5.2和pytesseract

代码:File "D:/test.py"

# -*- coding: utf-8 -*- 

try: 
    import Image 
except ImportError: 
    from PIL import Image 

import pytesseract 


print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif'))) 

错误:

Traceback (most recent call last): 
    File "D:/test.py", line 11, in <module> 
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string 
    errors = get_errors(error_string) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr> 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
TypeError: a bytes-like object is required, not 'str' 

我该怎么办?

编辑:

我训练数据下载到C:\Program Files (x86)\Tesseract-OCR\tessdata,像这样:

enter image description here

,我插入行error_string = error_string.decode("utf-8")get_errors(),错误的是这样的:

Traceback (most recent call last): 
    File "D:/test.py", line 11, in <module> 
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string 
    raise TesseractError(status, errors) 
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata') 

回答

0

这是一个已知的埠克pytesseract,看到issue #32

Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str'

There actually is an error in tesseract. But on the Python end the error occurs because error_string is returning a byte-literal, and the geterrors call appears to have trouble with it

解决办法是安装的训练数据对于一个给定的语言,看到Tesseract running error,或通过编辑site-packages\pytesseract\pytesseract.py并在顶部插入一个额外的行的get_errors()函数(在线路109)的:

error_string = error_string.decode("utf-8") 

然后,该函数读取:

def get_errors(error_string): 
    ''' 
    returns all lines in the error_string that start with the string "error" 
    ''' 

    error_string = error_string.decode("utf-8") 
    lines = error_string.splitlines() 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
    if len(error_lines) > 0: 
     return '\n'.join(error_lines) 
    else: 
     return error_string.strip() 
+0

它还有一些其他问题,请参阅我的编辑。 – zwl1619

+0

@ zwl1619:我不知道pytessaract是如何工作的。修正编码错误表明训练数据未按预期方式安装。错误是之前被抛出,但由于编码问题,你从来没有得到它。也许这是某种权限问题? –

相关问题