在tensorflow深MNIST例如使用GPU VS CPU

我使用我复制粘贴从here了一些小改动的程序。这是我的代码以试图提高训练速度：在tensorflow深MNIST例如使用GPU VS CPU

from tensorflow.examples.tutorials.mnist import input_data 
mnist = input_data.read_data_sets('MNIST_data', one_hot=True) 

import tensorflow as tf 

x = tf.placeholder(tf.float32, shape=[None, 784]) 
y_ = tf.placeholder(tf.float32, shape=[None, 10]) 
W = tf.Variable(tf.zeros([784,10])) 
b = tf.Variable(tf.zeros([10])) 

def weight_variable(shape): 
    initial = tf.truncated_normal(shape, stddev=0.1) 
    return tf.Variable(initial) 

def bias_variable(shape): 
    initial = tf.constant(0.1, shape=shape) 
    return tf.Variable(initial) 

def conv2d(x, W): 
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') 

def max_pool_2x2(x): 
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], 
         strides=[1, 2, 2, 1], padding='SAME') 

with tf.device('/gpu:0'): 
    W_conv1 = weight_variable([5, 5, 1, 32]) 
    b_conv1 = bias_variable([32]) 
    x_image = tf.reshape(x, [-1, 28, 28, 1]) 
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) 
    h_pool1 = max_pool_2x2(h_conv1) 

    W_conv2 = weight_variable([5, 5, 32, 64]) 
    b_conv2 = bias_variable([64]) 

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) 
    h_pool2 = max_pool_2x2(h_conv2) 

    W_fc1 = weight_variable([7 * 7 * 64, 1024]) 
    b_fc1 = bias_variable([1024]) 

    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) 
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) 

    keep_prob = tf.placeholder(tf.float32) 
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) 

    W_fc2 = weight_variable([1024, 10]) 
    b_fc2 = bias_variable([10]) 

    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2 

    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)) 
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) 
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) 
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 

    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as sess: 
    sess.run(tf.global_variables_initializer()) 
    for i in range(20000): 
     batch = mnist.train.next_batch(50) 
     if i % 100 == 0: 
     train_accuracy = accuracy.eval(feed_dict={ 
      x: batch[0], y_: batch[1], keep_prob: 1.0}) 
     print('step %d, training accuracy %g' % (i, train_accuracy)) 
     train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) 

    print('test accuracy %g' % accuracy.eval(feed_dict={ 
     x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

将会产生以下的输出：

Extracting MNIST_data/train-images-idx3-ubyte.gz 
Extracting MNIST_data/train-labels-idx1-ubyte.gz 
Extracting MNIST_data/t10k-images-idx3-ubyte.gz 
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz 
step 0, training accuracy 0.22 
step 100, training accuracy 0.76 
step 200, training accuracy 0.88 
...

的问题是，在本教程中所采取的原始代码的时间（即没有with tf.device（'/ gpu：0'）：在第26行）并且这段代码没有可测量的差别（每个步骤大约10秒）。我成功安装了cuda-8.0和cuDNN（经过数小时的失败尝试）。 “$ NVIDIA-SMI”返回以下输出：

Sun Jul 2 13:57:10 2017  
+-----------------------------------------------------------------------------+ 
| NVIDIA-SMI 375.26     Driver Version: 375.26     | 
|-------------------------------+----------------------+----------------------+ 
| GPU Name  Persistence-M| Bus-Id  Disp.A | Volatile Uncorr. ECC | 
| Fan Temp Perf Pwr:Usage/Cap|   Memory-Usage | GPU-Util Compute M. | 
|===============================+======================+======================| 
| 0 GeForce GT 710  Off | 0000:01:00.0  N/A |     N/A | 
| N/A 49C P0 N/A/N/A | 406MiB/2000MiB |  N/A  Default | 
+-------------------------------+----------------------+----------------------+ 


+-----------------------------------------------------------------------------+ 
| Processes:              GPU Memory | 
| GPU  PID Type Process name        Usage  | 
|=============================================================================| 
| 0     Not Supported           | 
+-----------------------------------------------------------------------------+

所以，问题是：

1）是工作过小，产生在选择CPU或GPU没有区别？ 2）或者在我的实现中有一些愚蠢的错误？

感谢您阅读整个问题。

来源

2017-07-02 Roofi

它只是意味着GPU默认情况下使用时可用。您应该明确地使用CPU来测量差异。 – user1735003

谢谢@ user1735003。我尝试了你的建议（用cpu替换gpu）。结果是每一步都要延长5秒。它应该更快，对吗？另外，当我从网站复制粘贴原始代码并将其与上述代码进行比较时，没有可观察到的差异。你能告诉我为什么吗？ – Roofi

没有任何错误提示，TensorFlow绝对可以用GPU上运行，你可以运行此代码的事实。这里的问题是，当你按原样运行TensorFlow时，默认情况下它会尝试在GPU上运行。有几种方法可以强制它在CPU上运行。

以此方式运行：CUDA_VISIBLE_DEVICES= python code.py。请注意，当你这样做，仍然有with tf.device('/gpu:0')，它会中断，所以删除它。
变化评论

with tf.device('/gpu:0')到with tf.device('/cpu:0')

编辑从问题的更多信息，什么allow_soft_placement和log_device_placement意味着ConfigProto见here。

来源

2017-07-02 15:14:03 jkschin

对不起不够明确@jkschin但确实，即使我不提'配置= tf.ConfigProto（allow_soft_placement =真，log_device_placement = TRUE）语句“TensorFlow的是，默认情况下，它会尝试在GPU上运行的”抱真'在会议的括号内。 – Roofi

这些参数不影响它是否在GPU上运行。参见[这里]（https://stackoverflow.com/questions/44873273/what-do-the-options-in-configproto-like-allow-soft-placement-and-log-device-plac/44873274#44873274）为更多信息。 – jkschin

请在回答中添加您的最新评论（适用于未来的googlers）@jkschin – Roofi

在tensorflow深MNIST例如使用GPU VS CPU

回答

相关问题