2016-04-04 113 views
-3

我正在尝试实施CNN来玩游戏。 我使用python与theano /烤宽面条。我已经建立了网络,现在正在弄清楚如何训练它。卷积神经网络:如何训练它? (无人监督)

所以现在我有一个批次的32个州并在该批次中的动作每个状态与该动作的预期回报

现在我该如何训练网络,以便它了解到在这些州的这些行为会带来这些回报?

编辑:澄清我的问题。

这里是我的全码:http://pastebin.com/zY8w98Ng 蛇进口:http://pastebin.com/fgGCabzR

我有这个麻烦一点:

def _train(self): 
    # Prepare Theano variables for inputs and targets 
    input_var = T.tensor4('inputs') 
    target_var = T.ivector('targets') 
    states = T.tensor4('states') 
    print "sampling mini batch..." 
    # sample a mini_batch to train on 
    mini_batch = random.sample(self._observations, self.MINI_BATCH_SIZE) 
    # get the batch variables 
    previous_states = [d[self.OBS_LAST_STATE_INDEX] for d in mini_batch] 
    actions = [d[self.OBS_ACTION_INDEX] for d in mini_batch] 
    rewards = [d[self.OBS_REWARD_INDEX] for d in mini_batch] 
    current_states = np.array([d[self.OBS_CURRENT_STATE_INDEX] for d in mini_batch]) 
    agents_expected_reward = [] 
    # print np.rollaxis(current_states, 3, 1).shape 
    print "compiling current states..." 
    current_states = np.rollaxis(current_states, 3, 1) 
    current_states = theano.compile.sharedvalue.shared(current_states) 

    print "getting network output from current states..." 
    agents_reward_per_action = lasagne.layers.get_output(self._output_layer, current_states) 


    print "rewards adding..." 
    for i in range(len(mini_batch)): 
     if mini_batch[i][self.OBS_TERMINAL_INDEX]: 
      # this was a terminal frame so need so scale future reward... 
      agents_expected_reward.append(rewards[i]) 
     else: 
      agents_expected_reward.append(
       rewards[i] + self.FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i].eval())) 

    # figure out how to train the model (self._output_layer) with previous_states, 
    # actions and agent_expected_rewards 

我想用previous_states,行动和agent_expected_rewards所以更新模型它知道这些行为会带来这些回报。

我希望它会是这个样子:

train_model = theano.function(inputs=[input_var], 
    outputs=self._output_layer, 
    givens={ 
     states: previous_states, 
     rewards: agents_expected_reward 
     expected_rewards: agents_expected_reward) 

我只是不明白的吉文斯会如何影响模型建设网络的时候,因为我不指定它们。我无法在theano和烤宽面条文档中找到它。

那么,我该如何更新模型/网络,以便“学习”。

如果还不清楚,请留言什么信息仍然需要。我一直试图弄清楚这几天。

回答

1

阅读完文档后,我终于找到了答案。我以前在错误的地方看过。

network = self._output_layer 
    prediction = lasagne.layers.get_output(network) 
    loss = lasagne.objectives.categorical_crossentropy(prediction, target_var) 
    loss = loss.mean() 

    params = lasagne.layers.get_all_params(network, trainable=True) 
    updates = lasagne.updates.sgd(loss, params, self.LEARN_RATE) 
    givens = { 
     states: current_states, 
     expected: agents_expected_reward, 
     real_rewards: rewards 
    } 
    train_fn = theano.function([input_var, target_var], loss, 
            updates=updates, on_unused_input='warn', 
            givens=givens, 
            allow_input_downcast='True') 
    train_fn(current_states, agents_expected_reward)