Tensorflow融合，但在mnist数据集中使用alexnet的训练准确性非常低

我写了一个tensoflow程序来使用MNIST数据集尝试Alexnet，但奇怪的是我的网络很快收敛，损失几乎不变。每批精度非常低，低于0.1，如以下：Tensorflow融合，但在mnist数据集中使用alexnet的训练准确性非常低

step 0 loss 2.29801 train_accuracy 0.14 
step 100 loss 2.30258 train_accuracy 0.07 
step 200 loss 2.30258 train_accuracy 0.15 
step 300 loss 2.30258 train_accuracy 0.09 
step 400 loss 2.30258 train_accuracy 0.08 
step 500 loss 2.30258 train_accuracy 0.06 
step 600 loss 2.30258 train_accuracy 0.15 
step 700 loss 2.30258 train_accuracy 0.16 
....

image like this

和这里是我的代码

import tensorflow as tf 
from tensorflow.python import debug as tf_debug 
import numpy as np 
from tensorflow.examples.tutorials.mnist import input_data 

data_dir='./minist' 
mnist = input_data.read_data_sets(data_dir,one_hot=True) 

def conv2d(name, x, ws, bs, strides=1): 
    w = tf.Variable(tf.truncated_normal(ws,stddev=0.01)) 
    b = tf.Variable(tf.constant(0.,shape=bs)) 
    x = tf.nn.conv2d(x, w, strides=[1,strides,strides,1], padding='SAME') 
    x = tf.nn.bias_add(x,b) 
    return tf.nn.relu(x, name=name) 

def maxpool2d(name, x, k=2): 
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], 
         padding='SAME', name=name) 
def fc_op(name, x, n_out): 
    n_in = x.get_shape()[-1].value 
    w = tf.Variable(tf.truncated_normal([n_in,n_out],stddev=0.01)) 
    b = tf.Variable(tf.constant(0.,shape=[n_out])) 
    x = tf.matmul(x, w) 
    x = tf.nn.bias_add(x, b) 
    return tf.nn.relu(x, name=name) 

def alex_net(x,num_classes): 
    conv1 = conv2d('conv1', x, [11,11,1,96] , [96], strides=4) 
    pool1 = maxpool2d('pool1', conv1) 
    conv2 = conv2d('conv2', pool1, [5,5,96,256] , [256]) 
    pool2 = maxpool2d('pool2', conv2) 
    conv3 = conv2d('conv3', pool2, [3,3,256,384] , [384]) 
    conv4 = conv2d('conv4', conv3, [3,3,384,384] , [384]) 
    conv5 = conv2d('conv5', conv4, [3,3,384,256] , [256]) 
    pool5 = maxpool2d('pool5', conv5) 

    shp = pool5.get_shape() 
    flattened_shape = shp[1].value*shp[2].value*shp[3].value 
    resh = tf.reshape(pool5, shape=[-1,flattened_shape], name='resh') 

    fc1 = fc_op('fc1', resh, 4096) 
    fc2 = fc_op('fc2', fc1, 4096) 
    fc3 = fc_op('fc3', fc2, num_classes) 
    return fc3 

# ############################ arguments setting 
learning_rate = 0.01 
train_steps= 8000 
num_classes = 10 
x = tf.placeholder(shape=[None, 784],dtype=tf.float32) 
x_image = tf.reshape(x, [-1, 28, 28, 1]) 
y=tf.placeholder(shape=[None,10],dtype=tf.float32) 

output=alex_net(x_image,num_classes) 

# ################################### trian 

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output,labels=y)) 
train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) 

# ################################### inference 
correct_pred = tf.equal(tf.argmax(output,1),tf.argmax(y,1)) 
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32)) 

init = tf.global_variables_initializer() 

sess = tf.InteractiveSession() 
sess.run(init) 

for i in range(train_steps): 
    xs, ys = mnist.train.next_batch(100) 
    sess.run(train_op,feed_dict={x:xs,y:ys}) 
    if i%100==0: 
     loss,train_accuracy = sess.run([cost,accuracy],feed_dict={x:xs,y:ys}) 
     print('step',i,'loss',loss,'train_accuracy',train_accuracy)

其实我想不仅MNIST但CIFAR-10，它来到了一个相同的问题

来源

2017-10-07 SpartarG117

你真的需要在最后一个完全连接的层上使用relu吗？ –

问题是你的网络有许多层，也每层的深度（滤波器数）是非常高的。此外，您正在从头开始培训网络。而你的数据集MNIST（60000图像）是非常少的。而且，每个MNIST图像只有28x28x1大小。

几种替代方案我可以建议你重新训练一个预训练模型，即做transfer learning。看看这个AlexNet重量文件。该架构与您在代码中所做的稍有不同。这种实现方式的好处是，它会抵消你所拥有的非常少的数据。

其他更好的选择是减少层数和每层过滤器的数量。通过这种方式，您将能够从头开始训练模型，并且速度也会非常快。（假设你没有GPU）。看看LeNet-5 architecture。

希望答案可以帮助你。

来源

2017-10-07 10:50:31 user1190882

谢谢你的回答！减少卷积层并在我的网络中使用较小的学习率后，我获得了更好的结果。 – SpartarG117

不客气。请接受我的回答并加以赞扬，以便它能鼓励我回答更多问题。 – user1190882

Tensorflow融合，但在mnist数据集中使用alexnet的训练准确性非常低

回答

相关问题