我想在Jupyter笔记本上使用pytesseract。Pytesseract:打开数据文件错误\ Program Files(x86)\ Tesseract-OCR \ en.traineddata
- 的Windows 10的x64
- 运行Jupyter笔记本(Anaconda3,Python的3.6.1)具有管理权限
- 包含TIFF文件的工作目录是不同的驱动器(Z :)
当我运行以下代码:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en', config = tessdata_dir_config))
我收到以下错误:
TesseractError Traceback (most recent call last)
<ipython-input-37-c1dcbc33cde4> in <module>()
11 # tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
12
---> 13 print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en'))
14 # print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
C:\Users\cpcho\AppData\Local\Continuum\Anaconda3\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, boxes, config)
123 if status:
124 errors = get_errors(error_string)
--> 125 raise TesseractError(status, errors)
126 f = open(output_file_name, 'rb')
127 try:
TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata')
我发现这两个引用有益的,但我失去了一些东西: https://github.com/madmaze/pytesseract/issues/50 https://github.com/madmaze/pytesseract/issues/64
谢谢你的时间在这!