使用TF-Slim的全卷积ResNets运行非常缓慢

我将最初在Caffe中实现的像素标记（FCN样式）的代码移植到TensorFlow中。我使用Slim实现的ResNet（ResNet-101），跨度为16px，并使用上卷积层对其进行上采样，以实现8px的最后跨度。由于输入图像的大小是任意的，因此batch_size = 1。问题是培训真的很慢。它在大约3.5分钟内处理100张图像，而我原来的caffe实现在同一硬件（Tesla K40m）上以30秒完成。下面是我的代码的简化版本：使用TF-Slim的全卷积ResNets运行非常缓慢

import datetime as dt 

import tensorflow as tf 
import tensorflow.contrib.slim as slim 
from tensorflow.contrib.slim.nets import resnet_v1 

from MyDataset import MyDataset 
from TrainParams import TrainParams 

dataset = MyDataset() 
train_param = TrainParams() 

#tf.device('/gpu:0') 

num_classes = 15 

inputs = tf.placeholder(tf.float32, shape=[1, None, None, 3]) 

with slim.arg_scope(resnet_v1.resnet_arg_scope(False)): 
    mean = tf.constant([123.68, 116.779, 103.939], 
         dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean') 
    im_centered = inputs - mean 
    net, end_points = resnet_v1.resnet_v1_101(im_centered, 
               global_pool=False, output_stride=16) 

    pred_upconv = slim.conv2d_transpose(net, num_classes, 
             kernel_size = [3, 3], 
             stride = 2, 
             padding='SAME') 

    targets = tf.placeholder(tf.float32, shape=[1, None, None, num_classes]) 

    loss = slim.losses.sigmoid_cross_entropy(pred_upconv, targets) 


log_dir = 'logs/' 

variables_to_restore = slim.get_variables_to_restore(include=["resnet_v1"]) 
restorer = tf.train.Saver(variables_to_restore) 

with tf.Session() as sess: 

    sess.run(tf.initialize_all_variables()) 
    sess.run(tf.initialize_local_variables()) 

    restorer.restore(sess, '/path/to/ResNet-101.ckpt') 

    optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001) 
    train_step = optimizer.minimize(loss) 
    t1 = dt.datetime.now() 
    for it in range(10000): 
     n1=dt.datetime.now() 
     batch = dataset.next_batch() # my function that prepares training batch 
     sess.run(train_step, feed_dict={inputs: batch['inputs'], 
             targets: batch['targets']}) 
     n2=dt.datetime.now() 
     time = (n2-n1).microseconds/(1000) 
     print("iteration ", it, "time", time)

我只是学习的框架，我只放在一起在两天的这段代码，让我明白它可能不是最好的。正如你所看到的，我也尝试测量数据准备代码和网络前后传输所花费的实际时间。这个时间实际上要小得多，总结了100次迭代，与实际运行时间相比只有50秒。我怀疑可能会有一些线程/进程同步进行，这不是衡量，但我觉得很奇怪。 top命令显示了大约10个进程，标题与它可能产生的主进程相同。我也收到如下警告：

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 1692 get requests, put_count=1316 evicted_count=1000 eviction_rate=0.759878 and unsatisfied allocation rate=0.87234 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 100 to 110

请问您是否可以指示我如何加快速度？

谢谢。

UPDATE。经过更多的研究后，我发现'喂养'数据与队列相比可能会比较慢，所以我在一个单独的线程中重新实现了带有队列的代码：https://gist.github.com/eldar/0ecc058670be340b92e5a1044dc8a089，但运行时间仍然差不多。

UPDATE2。看起来我觉得速度问题是什么。我训练完全卷积，我的图像是任意大小和长宽比。如果我喂养固定大小的虚拟随机numpy张量，它的工作速度很快。如果生成10个预定义大小的输入张量，前10次迭代很慢，但随后会加速。在TensorFlow中看起来像在每次迭代中调整所有张量的大小并不像Caffe那样高效。我将在项目的GitHub上提交一张票。

来源

2016-09-29 SimpleMan

请记住，这是一个巨大再用模型。 resnet_v1_101中的“101”来自于101层深的事实。 – Julius

不知道你是否期望得到虽然 – Julius

afaik他们使用几个不同的机器来训练它 – Julius

这个问题是由于任意大小的输入图像造成的。 TensorFlow拥有一种称为自动调节的功能，因此在运行时他们会针对每种特定输入大小分析各种算法，并决定哪种最佳。在我的情况下，每次迭代都会发生这种情况。

溶液设置环境变量TF_CUDNN_USE_AUTOTUNE=0：

export TF_CUDNN_USE_AUTOTUNE=0 
python myscript.py

更多在这个Github上票：https://github.com/tensorflow/tensorflow/issues/5048

来源

2016-10-19 08:44:50 SimpleMan

链接问题代码：https：//gist.github.com/eldar/0ecc058670be340b92e5a1044dc8a089 –

一般来说，TensorFlow resnet的实现不应该比caffe慢（太多）。我只比较了caffe/barrista（https://github.com/classner/barrista/tree/master/examples/residual-nets）和Tensorflow的示例（https://github.com/tensorflow/models/tree/master/resnet）中的实现，并且它们在相同速度下的完整训练中的差异可以忽略不计。

我确实遇到了Tensorflow实现的问题，它将我带到了这个页面。原因是，我构建的github版本并不稳定，并且由于中间开发代码非常慢。 A git pull并重新编译解决了这个问题。

但是，如果您正在为自己重新实现，请注意如何触发BatchNorm更新操作。在张量流例子中，这在resnet_model.py，l中完成。 172.它们被直接添加到run操作的“提取”中，因此并行和尽快执行。

来源

2016-10-16 20:53:15 Chris

感谢您的回复！我还使用了夜间版本，因为对ResNets的支持不在稳定版本中。你使用了哪个版本？另外，您是否使用了自己的数据集，在这种情况下，您是如何加载数据的？我怀疑我的数据加载代码可能不是最佳的。 – SimpleMan

所以我更新到0.11.0rc0版本，我可以看到没有其他10个python进程同时运行，这是一个好兆头，但它仍然一样慢。 – SimpleMan

使用TF-Slim的全卷积ResNets运行非常缓慢

回答

相关问题