TensorFlow TFRecord与许多图像在读取期间崩溃

我很难从TFRecord文件读取具有“许多”（多于500个）事件的文件。如果我创建500个事件文件，一切都很好，但超过500会导致一个错误，当我尝试读取并解析文件：TensorFlow TFRecord与许多图像在读取期间崩溃

W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: Could not parse example input, value: 
... 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 40: invalid start byte

的图像是具有形状(N, 2, 127, 50)花车（重新塑造，以(N, 127, 50, 2)在阅读过程中）。我试着用两种不同的方式写它们：作为字节列表和浮动列表，并且都以相同的方式失败。

对于“字节法”，该代码的业务部分是：

def write_to_tfrecord(data_dict, tfrecord_file): 
    writer = tf.python_io.TFRecordWriter(tfrecord_file) 
    features_dict = {} 
    for k in data_dict.keys(): 
     features_dict[k] = tf.train.Feature(
      bytes_list=tf.train.BytesList(value=[data_dict[k]['byte_data']]) 
     ) 
    example = tf.train.Example(
     features=tf.train.Features(feature=features_dict) 
    ) 
    writer.write(example.SerializeToString()) 
    writer.close()

，然后阅读：

def tfrecord_to_graph_ops_xtxutuvtv(filenames): 
    def process_hitimes(inp, shape): 
     hitimes = tf.decode_raw(inp, tf.float32) 
     hitimes = tf.reshape(hitimes, shape) 
     hitimes = tf.transpose(hitimes, [0, 2, 3, 1]) 
     return hitimes 

    file_queue = tf.train.string_input_producer(filenames, name='file_queue') 
    reader = tf.TFRecordReader() 
    _, tfrecord = reader.read(file_queue) 

    tfrecord_features = tf.parse_single_example(
     tfrecord, 
     features={ 
      'hitimes-x': tf.FixedLenFeature([], tf.string), 
     }, 
     name='data' 
    ) 
    hitimesx = proces_hitimes(
     tfrecord_features['hitimes-x'], [-1, 2, 127, 50] 
    ) 
    return hitimesx

（通常情况下，我看了也写其它张量，但问题在于只有一个）

对于“浮动法”，代码如下所示：

def write_to_tfrecord(data_dict, tfrecord_file): 
    writer = tf.python_io.TFRecordWriter(tfrecord_file) 
    features_dict = {} 
    features_dict['hitimes-x'] = tf.train.Feature(
     float_list=tf.train.FloatList(
      value=data_dict['hitimes-x']['data'].flatten() 
     ) 
    ) 
    example = tf.train.Example(
     features=tf.train.Features(feature=features_dict) 
    ) 
    writer.write(example.SerializeToString()) 
    writer.close()

和，读取时：

def tfrecord_to_graph_ops_xtxutuvtv(filenames): 
    def process_hitimes(inp, shape): 
     hitimes = tf.sparse_tensor_to_dense(inp) 
     hitimes = tf.reshape(hitimes, shape) 
     hitimes = tf.transpose(hitimes, [0, 2, 3, 1]) 
     return hitimes 

    file_queue = tf.train.string_input_producer(filenames, name='file_queue') 
    reader = tf.TFRecordReader() 
    _, tfrecord = reader.read(file_queue) 

    tfrecord_features = tf.parse_single_example(
     tfrecord, 
     features={ 
      'hitimes-x': tf.VarLenFeature(tf.float32), 
     }, 
     name='data' 
    ) 
    hitimesx = process_hitimes(
     tfrecord_features['hitimes-x'], [-1, 2, 127, 50] 
    ) 
    return hitimesx

正在写入的数据的类型的FLOAT32 NumPy的ndarrays。

我很想知道这是一个错误（我使用的是TensorFlow 1.0），因为这两种方法对于高达500幅图像都能很好地工作，但是当我尝试使用更多图像时会中断。我查看了文档，看看是否有我应该添加的参数，以便读者和作者可以处理更大的文件，但我没有找到任何东西（另外，500张图片不是很多 - 我需要写10张数以百万计）。

任何想法？我打算今天试用TensorFlow 1.2，但还没有机会。

来源

2017-07-12 Gabriel Perdue

我非常怀疑它与事件的数量有关。我正在使用tfrecord文件，每个文件都有10毫秒的事件，一切都很好。我建议你拍一张图片并保存1k次，以确定它与500的数字无关。然后找到哪张图片让读者感到不舒服，看看它与已有的图片有什么不同。 –

这不是事件500 - 我试过了。我认为这是TF 1.0中的一个错误。 –

我升级到TF 1.2.1并且上面的问题消失了（至少在使用ByteList时 - 我不确定哪种方法更加习惯TensorFlow，但将所有内容视为ByteList和字节数据更简单我在这）。

有一个新问题，我相信，当阅读一个大文件（现在，我可以写上25K的事件，也许更多，在TF记录文件）时 - 即TF打开整个文件一次，并将其全部加载到内存中，这比我的测试机器可以处理的数据处理更多，但我并不直接将此归咎于TensorFlow（尽管我需要提出某种方便的压缩或分块方案，等等。）。

来源

2017-07-13 14:03:01

TensorFlow TFRecord与许多图像在读取期间崩溃

回答

相关问题