2016-08-09 24 views
2

我的项目中的任务之一是加载数据集(chars74k)并为每个图像设置标签。在这个实现中,我已经有一个包含其他图像的矩阵和一个带有各自标签的列表。为了做任务,我写了下面的代码:Python - 加载多个图像的更快方法?

#images: (input/output)matrix of images 
#labels: (input/output)list of labels 
#path: (input)path to my root folder of images. It is like this: 
# path 
# |-folder1 
# |-folder2 
# |-folder3 
# |-... 
# |-lastFolder 

def loadChars74k(images, labels, path): 
    # list of directories 
    dirlist = [ item for item in os.listdir(path) if os.path.isdir(os.path.join(path, item)) ] 

    # for each subfolder, open all files, append to list of images x and set path as label in y 
    for subfolder in dirlist: 
     imagePath = glob.glob(path + '/' + subfolder +'/*.Bmp') 
     print "folder ", subfolder, " has ",len(imagePath), " images and matrix of images is:", images.shape, "labels are:", len(labels) 
     for i in range(len(imagePath)): 
      anImage = numpy.array(Image.open(imagePath[i]).convert('L'), 'f').ravel() 
      images = numpy.vstack((images,anImage)) 
      labels.append(subfolder) 

它工作正常,但它花费的时间太长(约20分钟)。我想知道是否有更快的方式来加载图像并设置标签。

问候。

+1

我猜测,大部分处理时间都在'Image.open(...)'上。你真的需要把所有的东西放在记忆里吗?也许只需要参考图像路径并在必要时读取文件? –

+0

另外,你为什么在范围内(len(imagePath)):'imagePath [i]'?不要遍历列表的索引,然后索引该列表:仅循环列表本身。 –

+0

好吧,我想我正在循环和打开所有图像加载到一个numpy.array,然后将图像附加到'图像'。这有什么不对? – Claudio

回答

-1

经过一番研究,我能改善这样的代码:

def loadChars74k(images, labels, path): 
    # list of directories 
    dirlist = [ item for item in os.listdir(path) if os.path.isdir(os.path.join(path, item)) ] 

    # for each subfolder, open all files, append to list of images x and set path as label in y 
    for subfolder in dirlist: 
     imagePath = glob.glob(path + '/' + subfolder +'/*.Bmp') 
     im_array = numpy.array([numpy.array(Image.open(imagePath[i]).convert('L'), 'f').ravel() for i in range(len(imagePath))]) 
     images = numpy.vstack((images, im_array)) 
     for i in range(len(imagePath)): 
      labels.append(subfolder) 

    return images, labels 

我敢肯定它可以提高甚至更多,但它的确定,现在!现在需要33秒!

+1

这是如何提高速度? – curio1729

+0

哈哈!这段代码是绝对错误的!图像读取部分在循环之外。 – Mehran