2

这是一个可能的重复Tensorflow: How to get gradients per instance in a batch?。无论如何,我都会问这个问题,因为没有一个令人满意的答案,这里的目标有点不同。几个批次的TensorFlow平均梯度

我有一个非常大的网络,我可以放在我的GPU上,但是我可以提供的最大批处理大小是32.任何比这更大的图像会导致GPU耗尽内存。我想要使​​用更大的批次以获得更精确的渐变近似值。

为了具体说明,我们假设我想通过依次喂入3批次32来计算大批量96的梯度。我知道的最好方法是使用Optimizer.compute_gradients()Optimizer.apply_gradients()。这里是一个小例子,它如何工作

import tensorflow as tf 
import numpy as np 

learn_rate = 0.1 

W_init = np.array([ [1,2,3], [4,5,6], [7,8,9] ], dtype=np.float32) 
x_init = np.array([ [11,12,13], [14,15,16], [17,18,19] ], dtype=np.float32) 

X = tf.placeholder(dtype=np.float32, name="x") 
W = tf.Variable(W_init, dtype=np.float32, name="w") 
y = tf.matmul(X, W, name="y") 
loss = tf.reduce_mean(y, name="loss") 

opt = tf.train.GradientDescentOptimizer(learn_rate) 
grad_vars_op = opt.compute_gradients(loss) 

sess = tf.Session() 
sess.run(tf.global_variables_initializer()) 

# Compute the gradients for each batch 
grads_vars1 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,0]}) 
grads_vars2 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,1]}) 
grads_vars3 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,2]}) 

# Separate the gradients from the variables 
grads1 = [ grad for grad, var in grads_vars1 ] 
grads2 = [ grad for grad, var in grads_vars2 ] 
grads3 = [ grad for grad, var in grads_vars3 ] 
varl = [ var for grad, var in grads_vars1 ] 

# Average the gradients 
grads = [ (g1 + g2 + g3)/3 for g1, g2, g3 in zip(grads1, grads2, grads3)] 

sess.run(opt.apply_gradients(zip(grads,varl))) 

print("Weights after 1 gradient") 
print(sess.run(W)) 

现在,这一切都非常丑陋和低效,因为直传被在GPU上运行,而平均梯度的CPU上发生的,然后将它们发生在GPU上再次。

此外,该代码抛出一个异常,因为gradsnp.array个清单,并使其工作,一个人必须要创造每一个梯度的tf.placeholder

我相信应该有更好更有效的方法来做到这一点?有什么建议么?

回答

4

您可以创建trainable_variables的副本并累计批梯度。这里有几个简单的步骤遵循

... 
opt = tf.train.GradientDescentOptimizer(learn_rate) 
# get all trainable variables 
t_vars = tf.trainable_variables() 
# create a copy of all trainable variables with `0` as initial values 
accum_tvars = [tf.Variable(tf.zeros_like(tv.initialized_value()),trainable=False) for t_var in t_vars]           
# create a op to initialize all accums vars 
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_tvars] 

# compute gradients for a batch 
batch_grads_vars = opt.compute_gradients(loss, t_vars) 
# collect the batch gradient into accumulated vars 
accum_ops = [accum_tvars[i].assign_add(batch_grad_var[0]) for i, batch_grad_var in enumerate(batch_grads_vars)] 

# apply accums gradients 
train_step = opt.apply_gradients([(accum_tvars[i], batch_grad_var[1]) for i, batch_grad_var in enumerate(batch_grads_vars)]) 
# train_step = opt.apply_gradients(zip(accum_tvars, zip(*batch_grads_vars)[1]) 

while True: 
    # initialize the accumulated gards 
    sess.run(zero_ops) 

    # number of batches for gradient accumulation 
    n_batches = 3 
    for i in xrange(n_batches): 
     sess.run(accum_ops, feed_dict={X: x_init[:, i]}) 

    sess.run(train_step) 
+0

很好的解决方案。在train_step和train_step列表解析中,而不是列举和索引(也可能更易读),会稍微更加pythonic。 – lejlot

+0

确实很好的解决方案。我确定所有操作都将在GPU上执行吗? – niko

+0

'assign_op'取决于变量定义的位置,cpu/gpu。你可以在gpus上计算其余的部分。 –