类型错误：一类字节对象是必需的，而不是“海峡”在Python 3.5.2和pytesseract

我使用python 3.5.2和pytesseract，有一个错误TypeError: a bytes-like object is required, not 'str'当我运行我的代码，（详情如下）：类型错误：一类字节对象是必需的，而不是“海峡”在Python 3.5.2和pytesseract

代码：File "D:/test.py"

# -*- coding: utf-8 -*- 

try: 
    import Image 
except ImportError: 
    from PIL import Image 

import pytesseract 


print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif')))

错误：

Traceback (most recent call last): 
    File "D:/test.py", line 11, in <module> 
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string 
    errors = get_errors(error_string) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr> 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
TypeError: a bytes-like object is required, not 'str'

我该怎么办？

编辑：

我训练数据下载到C:\Program Files (x86)\Tesseract-OCR\tessdata，像这样：

，我插入行error_string = error_string.decode("utf-8")到get_errors()，错误的是这样的：

Traceback (most recent call last): 
    File "D:/test.py", line 11, in <module> 
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string 
    raise TesseractError(status, errors) 
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')

来源

2016-12-28 zwl1619

这是一个已知的埠克pytesseract，看到issue #32：

Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str'

和

There actually is an error in tesseract. But on the Python end the error occurs because error_string is returning a byte-literal, and the geterrors call appears to have trouble with it

解决办法是安装的训练数据对于一个给定的语言，看到Tesseract running error，或通过编辑site-packages\pytesseract\pytesseract.py并在顶部插入一个额外的行的get_errors()函数（在线路109）的：

error_string = error_string.decode("utf-8")

然后，该函数读取：

def get_errors(error_string): 
    ''' 
    returns all lines in the error_string that start with the string "error" 
    ''' 

    error_string = error_string.decode("utf-8") 
    lines = error_string.splitlines() 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
    if len(error_lines) > 0: 
     return '\n'.join(error_lines) 
    else: 
     return error_string.strip()

来源

2016-12-28 19:51:22

它还有一些其他问题，请参阅我的编辑。 – zwl1619

@ zwl1619：我不知道pytessaract是如何工作的。修正编码错误表明训练数据未按预期方式安装。错误是之前被抛出，但由于编码问题，你从来没有得到它。也许这是某种权限问题？ –

类型错误：一类字节对象是必需的，而不是“海峡”在Python 3.5.2和pytesseract

回答

相关问题