2016-12-30 107 views
0

与单细胞GRU运行的RNN,我运行到那里,我得到以下堆栈跟踪tensorflow损失楠同时培养了RNN

Traceback (most recent call last): 
    File "language_model_test.py", line 15, in <module> 
    test_model() 
    File "language_model_test.py", line 12, in test_model 
    model.train(random_data, s) 
    File "/home/language_model/language_model.py", line 120, in train 
    train_pp = self._run_epoch(data, sess, inputs, rnn_ouputs, loss, trainOp, verbose) 
    File "/home/language_model/language_model.py", line 92, in _run_epoch 
    loss, _= sess.run([loss, trainOp], feed_dict=feed) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run 
    run_metadata_ptr) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 952, in _run 
    fetch_handler = _FetchHandler(self._graph, fetches, feed_dict_string) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 408, in __init__ 
    self._fetch_mapper = _FetchMapper.for_fetch(fetches) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 230, in for_fetch 
    return _ListFetchMapper(fetch) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 337, in __init__ 
    self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches] 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 238, in for_fetch 
    return _ElementFetchMapper(fetches, contraction_fn) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 271, in __init__ 
    % (fetch, type(fetch), str(e))) 
TypeError: Fetch argument nan has invalid type <type 'numpy.float32'>, must be a string or Tensor. (Can not convert a float32 into a Tensor or Operation.) 

计算损失的步骤似乎是问题

情况
def train(self,data, session=tf.Session(), verbose=10): 

     print "initializing model" 
     self._add_placeholders() 
     inputs = self._add_embedding() 
     rnn_ouputs, _ = self._run_rnn(inputs) 
     outputs = self._projection_layer(rnn_ouputs) 
     loss = self._compute_loss(outputs) 
     trainOp = self._add_train_step(loss) 
     start = tf.initialize_all_variables() 
     saver = tf.train.Saver() 

     with session as sess: 
      sess.run(start) 

      for epoch in xrange(self._max_epochs): 
       train_pp = self._run_epoch(data, sess, inputs, rnn_ouputs, loss, trainOp, verbose) 
       print "Training preplexity for batch {} - {}".format(epoch, train_pp) 

这里是_run_epoch

代码与损失的任何地方回来nan

def _run_epoch(self, data, session, inputs, rnn_ouputs, loss, trainOp, verbose=10): 
    with session.as_default() as sess: 
     total_steps = sum(1 for x in data_iterator(data, self._batch_size, self._max_steps)) 
     train_loss = [] 
     for step, (x,y, l) in enumerate(data_iterator(data, self._batch_size, self._max_steps)): 
      print "step - {0}".format(step) 
      feed = { 
       self.input_placeholder: x, 
       self.label_placeholder: y, 
       self.sequence_length: l, 
       self._dropout_placeholder: self._dropout, 
      } 
      loss, _= sess.run([loss, trainOp], feed_dict=feed) 
      print "loss - {0}".format(loss) 
      train_loss.append(loss) 
      if verbose and step % verbose == 0: 
       sys.stdout.write('\r{}/{} : pp = {}'. format(step, total_steps, np.exp(np.mean(train_loss)))) 
       sys.stdout.flush() 
      if verbose: 
       sys.stdout.write('\r') 

     return np.exp(np.mean(train_loss)) 

这当我通过使用用于我的数据 random_data = np.random.normal(0, 100, size=[42068, 46])其被设计成使用词ID是传递作为输入,以模拟以下测试我的代码被产生。我的代码的其余部分可以在以下gist

编辑在这里被发现的是,我运行测试套件,此问题将产生的方式:

def test_model(): 
    model = Language_model(vocab=range(0,101)) 
    s = tf.Session() 
    #1 more than step size to acoomodate for the <eos> token at the end 
    random_data = np.random.normal(0, 100, size=[42068, 46]) 
    # file = "./data/ptb.test.txt" 
    print "Fitting started" 
    model.train(random_data, s) 

if __name__ == "__main__": 
    test_model() 

如果我代替random_data成其他语言模型,他们也将输出nan的成本。我的理解是,通过传递给字典中的tensorflow应该取数值并检索与该id对应的适当嵌入向量,我不明白为什么random_data对其他模型造成nan

回答

0

有几个问题与上述

代码让开始与此线

random_data = np.random.normal(0, 100, size=[42068, 46]) 

np.random.normal(...)不产生不同的值,它而产生的浮点值,让尝试上述下面的例子,但具有可管理的大小。

>>> np.random.normal(0, 100, size=[5]) 
array([-53.12407229, 39.57335574, -98.25406749, 90.81471139, -41.05069646]) 

有没有办法,因为这些都意味着是输入嵌入模式,我们已经与浮点值相处得负值的机器学习算法可以学习这些。

什么是真正想要的是下面的代码:

random_data = np.random.randint(0, 101, size=...) 

检查它的输出,我们得到

>>> np.random.randint(0, 100, size=[5]) 
array([27, 47, 33, 12, 24]) 

接下来,以下行实际上是创建一个微妙的问题。

def _run_epoch(self, data, session, inputs, rnn_ouputs, loss, train, verbose=10): 
    with session.as_default() as sess: 
     total_steps = sum(1 for x in data_iterator(data, self._batch_size, self._max_steps)) 
     train_loss = [] 
     for step, (x,y, l) in enumerate(data_iterator(data, self._batch_size, self._max_steps)): 
      print "step - {0}".format(step) 
      feed = { 
       self.input_placeholder: x, 
       self.label_placeholder: y, 
       self.sequence_length: l, 
       self._dropout_placeholder: self._dropout, 
      } 
      loss, _= sess.run([loss, train], feed_dict=feed) 
      print "loss - {0}".format(loss) 
      train_loss.append(loss) 
      if verbose and step % verbose == 0: 
       sys.stdout.write('\r{}/{} : pp = {}'. format(step, total_steps, np.exp(np.mean(train_loss)))) 
       sys.stdout.flush() 
      if verbose: 
       sys.stdout.write('\r') 

     return np.exp(np.mean(train_loss)) 

loss既是参数变量和一个变量,所以第一次它的运行,这将不再是一个张量,所以我们不能真正把它在一个会话。