pytesseract临时输出文件“没有这样的文件或目录”错误

我使用pytesseract与线：pytesseract临时输出文件“没有这样的文件或目录”错误

text = image_to_string(temp_test_file, 
         lang='eng', 
         boxes=False, 
         config='-c preserve_interword_spaces=1 hocr')

，并与收到错误

pytesseract.py 
135| f = open(output_file_name, 'rb') 

No such file or directory: 
/var/folders/j3/dn60cg6d42bc2jwng_qzzyym0000gp/T/tess_EDOHFP.txt

查看源代码pytesseract here ，它似乎无法找到它用来存储tesseract命令输出的临时输出文件。

我已经在这里看到了其他的答案，通过检查tesseract是否已经安装并可以从命令终端调用并且对我来说已经解决了，所以这不是问题。任何想法这可能是什么，以及如何解决它？

谢谢:)

来源

2017-08-07 lampShadesDrifter

事实证明，这是pytesseract无法找到临时的输出文件是他们在被存储比.txt或.box的其他扩展的原因（他们.hocr文件）。从源代码中，这些是pytesseract支持的唯一类型的tesseract输出文件（或者更像是pytesseract的“查找”）。从源头上相关的片段低于：

input_file_name = '%s.bmp' % tempnam() output_file_name_base = tempnam() if not boxes: output_file_name = '%s.txt' % output_file_name_base else: 123 output_file_name = '%s.box' % output_file_name_base

if status: errors = get_errors(error_string) raise TesseractError(status, errors) 135 f = open(output_file_name, 'rb')

在pytesseract的github上pulls来看，这似乎是对其它输出类型，计划但尚未实施的支持（我使用的源代码以显示为什么.hocr文件似乎未被发现是从pytesseract master分支复制/粘贴）。

在此之前，我对pytesseract脚本进行了一些篡改，以支持多种文件类型。

此版本没有为输出文件设置扩展名（因为tesseract会自动执行此扩展）并查看pytesseract将其临时输出文件存储到的目录，并查找以输出文件名开头的文件（最多“”通过pytesseract分配的第一个字符）（无需关心扩展名）：

def tempnam(): 
    ''' returns a temporary file-name and directory ''' 
    tmpfile = tempfile.NamedTemporaryFile(prefix="tess_") 
    return tmpfile.name, tempfile.tempdir 


def image_to_string(image, lang=None, boxes=False, config=None, nice=0): 
    if len(image.split()) == 4: 
     # In case we have 4 channels, lets discard the Alpha. 
     # Kind of a hack, should fix in the future some time. 
     r, g, b, a = image.split() 
     image = Image.merge("RGB", (r, g, b)) 
    (input_file_name, _) = tempnam() #'%s.bmp' % tempnam() 
    input_file_name += '.bmp' 
    (output_file_name_base, output_filename_base_dir) = tempnam() 
    if not boxes: 
     # Don’t put an extension on the output file name because Tesseract will do it automatically 
     output_file_name = '%s' % output_file_name_base 
    else: 
     output_file_name = '%s.box' % output_file_name_base 

    try: 
     ########## DEBUGGING 
     #print('input file name: %s' % input_file_name) 
     #print('temp output name: %s' % output_file_name) 
     #print('temp output dir: %s' % output_filename_base_dir) 
     ########## 

     image.save(input_file_name) 
     status, error_string = run_tesseract(input_file_name, 
              output_file_name_base, 
              lang=lang, 
              boxes=boxes, 
              config=config, 
              nice=nice) 

     if status: 
      errors = get_errors(error_string) 
      raise TesseractError(status, errors) 


     # find the temp output file in temp dir under whatever extension tesseract has assigned 
     output_file_name += '.' 
     output_file_name_leaf = os.path.basename(output_file_name) 
     print('**output file starts with %s, type: %s' % (output_file_name, type(output_file_name))) 
     l=os.listdir(output_filename_base_dir) 
     for f in l:    
      if f.startswith(output_file_name_leaf): 
       output_file_name_leaf = f 
       break 


     output_file_name_abs = os.path.join(output_filename_base_dir, output_file_name_leaf) 
     f = open(output_file_name_abs, 'rb') 
     try: 
      return f.read().decode('utf-8').strip() 
     finally: 
      f.close() 

    finally: 
     cleanup(input_file_name) 
     # if successfully created and opened temp output file 
     if 'output_file_name_abs' in locals(): 
      output_file_name = output_file_name_abs 
      print('**temp output file %s successfully created and deleted' % output_file_name) 
     cleanup(output_file_name)

希望这有助于他人。

来源

2017-08-07 02:58:58 lampShadesDrifter

pytesseract临时输出文件“没有这样的文件或目录”错误

回答

相关问题