2017-07-02 71 views
0

我使用我复制粘贴从here了一些小改动的程序。这是我的代码以试图提高训练速度:在tensorflow深MNIST例如使用GPU VS CPU

from tensorflow.examples.tutorials.mnist import input_data 
mnist = input_data.read_data_sets('MNIST_data', one_hot=True) 

import tensorflow as tf 

x = tf.placeholder(tf.float32, shape=[None, 784]) 
y_ = tf.placeholder(tf.float32, shape=[None, 10]) 
W = tf.Variable(tf.zeros([784,10])) 
b = tf.Variable(tf.zeros([10])) 

def weight_variable(shape): 
    initial = tf.truncated_normal(shape, stddev=0.1) 
    return tf.Variable(initial) 

def bias_variable(shape): 
    initial = tf.constant(0.1, shape=shape) 
    return tf.Variable(initial) 

def conv2d(x, W): 
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') 

def max_pool_2x2(x): 
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], 
         strides=[1, 2, 2, 1], padding='SAME') 

with tf.device('/gpu:0'): 
    W_conv1 = weight_variable([5, 5, 1, 32]) 
    b_conv1 = bias_variable([32]) 
    x_image = tf.reshape(x, [-1, 28, 28, 1]) 
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) 
    h_pool1 = max_pool_2x2(h_conv1) 

    W_conv2 = weight_variable([5, 5, 32, 64]) 
    b_conv2 = bias_variable([64]) 

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) 
    h_pool2 = max_pool_2x2(h_conv2) 

    W_fc1 = weight_variable([7 * 7 * 64, 1024]) 
    b_fc1 = bias_variable([1024]) 

    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) 
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) 

    keep_prob = tf.placeholder(tf.float32) 
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) 

    W_fc2 = weight_variable([1024, 10]) 
    b_fc2 = bias_variable([10]) 

    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2 

    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)) 
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) 
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) 
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 

    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as sess: 
    sess.run(tf.global_variables_initializer()) 
    for i in range(20000): 
     batch = mnist.train.next_batch(50) 
     if i % 100 == 0: 
     train_accuracy = accuracy.eval(feed_dict={ 
      x: batch[0], y_: batch[1], keep_prob: 1.0}) 
     print('step %d, training accuracy %g' % (i, train_accuracy)) 
     train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) 

    print('test accuracy %g' % accuracy.eval(feed_dict={ 
     x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})) 

将会产生以下的输出:

Extracting MNIST_data/train-images-idx3-ubyte.gz 
Extracting MNIST_data/train-labels-idx1-ubyte.gz 
Extracting MNIST_data/t10k-images-idx3-ubyte.gz 
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz 
step 0, training accuracy 0.22 
step 100, training accuracy 0.76 
step 200, training accuracy 0.88 
... 

的问题是,在本教程中所采取的原始代码的时间(即没有with tf.device('/ gpu:0'):在第26行)并且这段代码没有可测量的差别(每个步骤大约10秒)。我成功安装了cuda-8.0和cuDNN(经过数小时的失败尝试)。 “$ NVIDIA-SMI”返回以下输出:

Sun Jul 2 13:57:10 2017  
+-----------------------------------------------------------------------------+ 
| NVIDIA-SMI 375.26     Driver Version: 375.26     | 
|-------------------------------+----------------------+----------------------+ 
| GPU Name  Persistence-M| Bus-Id  Disp.A | Volatile Uncorr. ECC | 
| Fan Temp Perf Pwr:Usage/Cap|   Memory-Usage | GPU-Util Compute M. | 
|===============================+======================+======================| 
| 0 GeForce GT 710  Off | 0000:01:00.0  N/A |     N/A | 
| N/A 49C P0 N/A/N/A | 406MiB/2000MiB |  N/A  Default | 
+-------------------------------+----------------------+----------------------+ 


+-----------------------------------------------------------------------------+ 
| Processes:              GPU Memory | 
| GPU  PID Type Process name        Usage  | 
|=============================================================================| 
| 0     Not Supported           | 
+-----------------------------------------------------------------------------+ 

所以,问题是:

1)是工作过小,产生在选择CPU或GPU没有区别? 2)或者在我的实现中有一些愚蠢的错误?

感谢您阅读整个问题。

+1

它只是意味着GPU默认情况下使用时可用。您应该明确地使用CPU来测量差异。 – user1735003

+0

谢谢@ user1735003。我尝试了你的建议(用cpu替换gpu)。结果是每一步都要延长5秒。它应该更快,对吗?另外,当我从网站复制粘贴原始代码并将其与上述代码进行比较时,没有可观察到的差异。你能告诉我为什么吗? – Roofi

回答

0

没有任何错误提示,TensorFlow绝对可以用GPU上运行,你可以运行此代码的事实。这里的问题是,当你按原样运行TensorFlow时,默认情况下它会尝试在GPU上运行。有几种方法可以强制它在CPU上运行。

  1. 以此方式运行:CUDA_VISIBLE_DEVICES= python code.py。请注意,当你这样做,仍然有with tf.device('/gpu:0'),它会中断,所以删除它。
  2. 变化评论

    with tf.device('/gpu:0')with tf.device('/cpu:0')

编辑从问题的更多信息,什么allow_soft_placementlog_device_placement意味着ConfigProto见here

+0

对不起不够明确@jkschin但确实,即使我不提'配置= tf.ConfigProto(allow_soft_placement =真,log_device_placement = TRUE)语句“TensorFlow的是,默认情况下,它会尝试在GPU上运行的”抱真'在会议的括号内。 – Roofi

+0

这些参数不影响它是否在GPU上运行。参见[这里](https://stackoverflow.com/questions/44873273/what-do-the-options-in-configproto-like-allow-soft-placement-and-log-device-plac/44873274#44873274)为更多信息。 – jkschin

+0

请在回答中添加您的最新评论(适用于未来的googlers)@jkschin – Roofi