2017-07-23 41 views
2

我想学习tensorflow通过他们的教程,并进行小的修改。我遇到一个错误,即对代码进行微小的更改会导致输出变为nan。Tensorflow返回南为什么应该是简单的计算

他们原来的代码是这样的:

import numpy as np 
import tensorflow as tf 

# Model parameters 
W = tf.Variable([.3], dtype=tf.float32) 
b = tf.Variable([-.3], dtype=tf.float32) 
# Model input and output 
x = tf.placeholder(tf.float32) 
linear_model = W * x + b 
y = tf.placeholder(tf.float32) 
# loss 
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares 
# optimizer 
optimizer = tf.train.GradientDescentOptimizer(0.01) 
train = optimizer.minimize(loss) 
# training data 
x_train = [1,2,3,4] 
y_train = [0,-1,-2,-3] 
# training loop 
init = tf.global_variables_initializer() 
sess = tf.Session() 
sess.run(init) # reset values to wrong 
for i in range(1000): 
    sess.run(train, {x:x_train, y:y_train}) 

# evaluate training accuracy 
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x:x_train, y:y_train}) 
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss)) 

的这个输出是:

>python linreg2.py 
2017-07-22 22:19:41.409167: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:19:41.409311: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:19:41.412452: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:19:41.412556: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:19:41.412683: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:19:41.412826: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:19:41.412958: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:19:41.413086: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 
W: [-0.9999969] b: [ 0.99999082] loss: 5.69997e-11 

注意所有的邮件我每次我运行时得到的,因为我使用PIP安装,没我自己编译它。但是,它得到正确的输出,用W = -1和b = 1

我修改了代码这一点,只是增加了x_train和y_train变量:

import numpy as np 
import tensorflow as tf 

# Model parameters 
W = tf.Variable([.3], dtype=tf.float32) 
b = tf.Variable([-.3], dtype=tf.float32) 
# Model input and output 
x = tf.placeholder(tf.float32) 
linear_model = W * x + b 
y = tf.placeholder(tf.float32) 
# loss 
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares 
# optimizer 
optimizer = tf.train.GradientDescentOptimizer(0.01) 
train = optimizer.minimize(loss) 
# training data 
x_train = [1,2,3,4,5,6,7] 
y_train = [0,-1,-2,-3,-4,-5,-6] 
# training loop 
init = tf.global_variables_initializer() 
sess = tf.Session() 
sess.run(init) # reset values to wrong 
for i in range(1000): 
    sess.run(train, {x:x_train, y:y_train}) 

# evaluate training accuracy 
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x:x_train, y:y_train}) 
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss)) 

这是输出这个新的代码:

2017-07-22 22:23:13.129983: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:23:13.130125: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:23:13.130853: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:23:13.130986: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:23:13.131126: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:23:13.131234: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:23:13.132178: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 
2017-07-22 22:23:13.132874: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 
W: [ nan] b: [ nan] loss: nan 

我真的不知道为什么扩展训练数据应该导致这种情况发生。有什么我失踪?

另外,我完全不确定如何在TF中调试东西,比如随着它在循环中进行循环并更改变量而逐渐打印值。只是将变量打印出来似乎不起作用。我想知道,所以我可以在未来为自己调试这些东西!

回答

2

欢迎来到超参数调谐的美妙世界。你可以试试下面的办法,第一,而不是提供一份在年底一些输出,你也可以打印一些输出在你的for循环,这可能进而成为:

for i in range(1000): 
    curr_W, curr_b, curr_loss,_ = sess.run([W, b, loss, train], {x:x_train, y:y_train}) 
    print("Iteration %d W: %s b: %s loss: %s"%(i, curr_W, curr_b, curr_loss)) 

如果您运行此则输出的样子:

Iteration 0 W: [-2.61199999] b: [-0.84599996] loss: 153.79 
Iteration 1 W: [ 2.93535995] b: [ 0.31516004] loss: 554.292 
Iteration 2 W: [-7.70013809] b: [-1.79276371] loss: 2020.55 
Iteration 3 W: [ 12.6241951] b: [ 2.35030031] loss: 7387.32 
Iteration 4 W: [-26.27972031] b: [-5.46829081] loss: 27029.6 
Iteration 5 W: [ 48.12573624] b: [ 9.59391212] loss: 98918.8 
Iteration 6 W: [-94.23892212] b: [-19.11964607] loss: 362027.0 
Iteration 7 W: [ 178.09707642] b: [ 35.9108963] loss: 1.32498e+06 
Iteration 8 W: [-342.92483521] b: [-69.27098846] loss: 4.84928e+06 
Iteration 9 W: [ 653.81640625] b: [ 132.04486084] loss: 1.77479e+07 
Iteration 10 W: [-1253.05480957] b: [-252.99859619] loss: 6.49554e+07 
... 
Iteration 60 W: [ -1.52910250e+17] b: [ -3.08788499e+16] loss: 9.6847e+35 
Iteration 61 W: [ 2.92530566e+17] b: [ 5.90739251e+16] loss: 3.54451e+36 
Iteration 62 W: [ -5.59636369e+17] b: [ -1.13013526e+17] loss: 1.29725e+37 
Iteration 63 W: [ 1.07063302e+18] b: [ 2.16204754e+17] loss: 4.74782e+37 
Iteration 64 W: [ -2.04821397e+18] b: [ -4.13618407e+17] loss: 1.73766e+38 
Iteration 65 W: [ 3.91841178e+18] b: [ 7.91287870e+17] loss: inf 
Iteration 66 W: [ -7.49626247e+18] b: [ -1.51380280e+18] loss: inf 
Iteration 67 W: [ 1.43410016e+19] b: [ 2.89603611e+18] loss: inf 
Iteration 68 W: [ -2.74355815e+19] b: [ -5.54036982e+18] loss: inf 
Iteration 69 W: [ 5.24866609e+19] b: [ 1.05992074e+19] loss: inf 
... 
Iteration 126 W: [ -6.01072457e+35] b: [ -1.21381189e+35] loss: inf 
Iteration 127 W: [ 1.14990384e+36] b: [ 2.32212753e+35] loss: inf 
Iteration 128 W: [ -2.19986564e+36] b: [ -4.44243161e+35] loss: inf 
Iteration 129 W: [ inf] b: [ 8.49875587e+35] loss: inf 
Iteration 130 W: [ nan] b: [-inf] loss: inf 
Iteration 131 W: [ nan] b: [ nan] loss: nan 
Iteration 132 W: [ nan] b: [ nan] loss: nan 

这时,你应该能够看到,对于W和b的值被更新,以积极,而是减少你的损失实际上是增加,接近无穷大相当快。这反过来意味着你的学习速度是遥远的。如果除以10的学习速度,并把它设置为0.001的最终结果是:

W: [-0.97952145] b: [ 0.8985914] loss: 0.0144026 

而这则是您的型号尚未收敛,但(也看前面的输出指示,最好你创造。损失的曲线图,其中学习速率被设置为0.05下一个实验给出:

W: [-0.99999958] b: [ 0.99999791] loss: 6.48015e-12 

因此结论:

  • 尽量拉中间结果出来sess.run()的(或eval()某些张量),看看模型是如何学习的。
  • 超参数调整的乐趣和利润。

注意:在这一刻,您仍然使用固定学习率的“简单”梯度下降,但也有优化器会自动调整学习速率。优化器(及其参数)的选择也是其他超参数。

+0

非常感谢您的帮助!现在我知道如何为事物打印中间值,所以我现在对自己的方式感觉更好。尽管如此,对于这种学习速度的东西,似乎有些随意。我知道学习速度与步长有关,但我真的不明白为什么它会出现失控,并且由于步长过大而损失达到无穷大。对于这个函数,整个渐变只是一个常量,那么为什么会发生? –

相关问题