TensorFlow：2层前馈神经网络

我想在TensorFlow（Python 3版本）中实现一个简单的完全连接的前馈神经网络。网络有2个输入和1个输出，我试图训练它输出两个输入的XOR。我的代码如下：TensorFlow：2层前馈神经网络

import numpy as np 
import tensorflow as tf 

sess = tf.InteractiveSession() 

inputs = tf.placeholder(tf.float32, shape = [None, 2]) 
desired_outputs = tf.placeholder(tf.float32, shape = [None, 1]) 

weights_1 = tf.Variable(tf.zeros([2, 3])) 
biases_1 = tf.Variable(tf.zeros([1, 3])) 
layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1) 

weights_2 = tf.Variable(tf.zeros([3, 1])) 
biases_2 = tf.Variable(tf.zeros([1, 1])) 
layer_2_outputs = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2) 

error_function = -tf.reduce_sum(desired_outputs * tf.log(layer_2_outputs)) 
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function) 

sess.run(tf.initialize_all_variables()) 

training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]] 
training_outputs = [[0.0], [1.0], [1.0], [0.0]] 

for i in range(10000): 
    train_step.run(feed_dict = {inputs: np.array(training_inputs), desired_outputs: np.array(training_outputs)}) 

print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[0.0, 0.0]])})) 
print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[0.0, 1.0]])})) 
print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[1.0, 0.0]])})) 
print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[1.0, 1.0]])}))

这似乎很简单，但在最后的打印报表显示，该神经网络是隔靴搔痒所需的输出，无论训练迭代或学习率的数。任何人都可以看到我做错了什么？

谢谢。

编辑：我也尝试以下替代误差函数：

error_function = 0.5 * tf.reduce_sum(tf.sub(layer_2_outputs, desired_outputs) * tf.sub(layer_2_outputs, desired_outputs))

即误差函数是错误的平方和。它总是导致网络输出值恰好为0.5--这是我的代码中某处出现错误的另一个指示。

编辑2：我发现我的代码适用于AND和OR，但不适用于XOR。我现在非常困惑。

来源

2016-07-25 CircuitScholar

你的代码有几个问题。在下面，我将评论每一行，为您带来解决方案。

注意：XOR不是线性可分的。你需要多于一个隐藏层。

N.B：以# [!]开头的行是您错误的行。

import numpy as np 
import tensorflow as tf 

sess = tf.InteractiveSession() 

# a batch of inputs of 2 value each 
inputs = tf.placeholder(tf.float32, shape=[None, 2]) 

# a batch of output of 1 value each 
desired_outputs = tf.placeholder(tf.float32, shape=[None, 1]) 

# [!] define the number of hidden units in the first layer 
HIDDEN_UNITS = 4 

# connect 2 inputs to 3 hidden units 
# [!] Initialize weights with random numbers, to make the network learn 
weights_1 = tf.Variable(tf.truncated_normal([2, HIDDEN_UNITS])) 

# [!] The biases are single values per hidden unit 
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS])) 

# connect 2 inputs to every hidden unit. Add bias 
layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1) 

# [!] The XOR problem is that the function is not linearly separable 
# [!] A MLP (Multi layer perceptron) can learn to separe non linearly separable points (you can 
# think that it will learn hypercurves, not only hyperplanes) 
# [!] Lets' add a new layer and change the layer 2 to output more than 1 value 

# connect first hidden units to 2 hidden units in the second hidden layer 
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, 2])) 
# [!] The same of above 
biases_2 = tf.Variable(tf.zeros([2])) 

# connect the hidden units to the second hidden layer 
layer_2_outputs = tf.nn.sigmoid(
    tf.matmul(layer_1_outputs, weights_2) + biases_2) 

# [!] create the new layer 
weights_3 = tf.Variable(tf.truncated_normal([2, 1])) 
biases_3 = tf.Variable(tf.zeros([1])) 

logits = tf.nn.sigmoid(tf.matmul(layer_2_outputs, weights_3) + biases_3) 

# [!] The error function chosen is good for a multiclass classification taks, not for a XOR. 
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs)) 

train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function) 

sess.run(tf.initialize_all_variables()) 

training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]] 

training_outputs = [[0.0], [1.0], [1.0], [0.0]] 

for i in range(20000): 
    _, loss = sess.run([train_step, error_function], 
         feed_dict={inputs: np.array(training_inputs), 
            desired_outputs: np.array(training_outputs)}) 
    print(loss) 

print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 0.0]])})) 
print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 1.0]])})) 
print(sess.run(logits, feed_dict={inputs: np.array([[1.0, 0.0]])})) 
print(sess.run(logits, feed_dict={inputs: np.array([[1.0, 1.0]])}))

我增加了列车迭代次数，以确保网络将收敛，无论随机初始化值是什么。

输出，经过20000火车迭代：

[[ 0.01759939]] 
[[ 0.97418505]] 
[[ 0.97734243]] 
[[ 0.0310041]]

它看起来相当不错。

来源

2016-07-25 20:04:03 nessuno

非常感谢您的彻底解答。我了解您所做的更改。但是，我试图执行XOR操作，而不是OR操作。所以我的目标产出实际上没有错误;目标输出是[[0.0]，[1.0]，[1.0]，[0.0]]。使用你的代码，我仍然无法让神经网络执行XOR。你能提供任何帮助吗？ – CircuitScholar

我更新了我的答案。 – nessuno

谢谢。我实际上只能通过2层完成目标。您将权重初始化为非零值的想法是我的代码工作的原因。 – CircuitScholar

您的实施看起来正确。这里有一些事情你可以尝试：

变化tf.nn.sigmoid到其他非线性激活功能
使用较小的学习速率（1E-3 1E-5）
使用多层
按照XOR neural network architecture

来源

2016-07-25 19:28:43 ahaque

我试过了你的建议;没有成功。我想补充一点，在训练之后，所有的输入都会产生非常相似的输出（即00,01,10和11都会导致神经网络输出〜0.77）。由于它是一个简单的完全连接的网络，在这种情况下更多的层将不会产生任何额外的能力或准确性，所以我想避免这种情况。我之前也在MATLAB中实现了这个精确的神经网络，并且它工作正常，所以我确信我只是在我的代码中犯了一个错误。 – CircuitScholar

在'error_function'中，不要直接乘所需的输出，而是减去这些值。也尝试将其转换为欧几里得损失。或者，您可以将其作为分类而不是回归问题。 – ahaque

我尝试了一个更传统的错误函数（请参阅我的原始文章的编辑），涉及目标和输出之间的差异，而不是交叉熵。不过，我仍然遇到不正确的行为。 – CircuitScholar

TensorFlow：2层前馈神经网络

回答

相关问题