2017-09-04 34 views
4

我有一个fit()函数,它使用ModelCheckpoint()回调来保存模型,如果它比以前的任何模型都好,使用save_weights_only = False,那么它将保存整个模型。这应该允许我通过使用load_model()在稍后的日期恢复培训。如何在凯拉斯的培训课程中保留指标值?

不幸的是,在save()/ load_model()往返中的某处,未被保留 - 例如,val_loss被设置为inf。这意味着,当训练恢复时,在第一个时间点ModelCheckpoint()将始终保存模型,而这个模型几乎总是比前一届的冠军差。

我已经决定,我可以恢复训练之前设置ModelCheckpoint()的当前最佳值,如下所示:

myCheckpoint = ModelCheckpoint(...) 
myCheckpoint.best = bestValueSoFar 

很显然,我可以监视我所需要的值,并将其写入到一个文件,并在我恢复时再次阅读它们,但鉴于我是凯拉斯新手,我想知道我是否漏掉了一些明显的东西。

+1

如果你问题,你应该标记为'Answer'最有用的反应,所以它不再列为开放性问题。 – FlashTek

+0

明天我不能那样做,但是谢谢你提醒我。 – MadOverlord

回答

3

我最终很快写出了自己的回调函数,用于跟踪最佳训练值,以便重新加载它们。它看起来像这样:

# State monitor callback. Tracks how well we are doing and writes 
# some state to a json file. This lets us resume training seamlessly. 
# 
# ModelState.state is: 
# 
# { "epoch_count": nnnn, 
# "best_values": { dictionary with keys for each log value }, 
# "best_epoch": { dictionary with keys for each log value } 
# } 

class ModelState(callbacks.Callback): 

    def __init__(self, state_path): 

     self.state_path = state_path 

     if os.path.isfile(state_path): 
      print('Loading existing .json state') 
      with open(state_path, 'r') as f: 
       self.state = json.load(f) 
     else: 
      self.state = { 'epoch_count': 0, 
          'best_values': {}, 
          'best_epoch': {} 
         } 

    def on_train_begin(self, logs={}): 

     print('Training commences...') 

    def on_epoch_end(self, batch, logs={}): 

     # Currently, for everything we track, lower is better 

     for k in logs: 
      if k not in self.state['best_values'] or logs[k] < self.state['best_values'][k]: 
       self.state['best_values'][k] = float(logs[k]) 
       self.state['best_epoch'][k] = self.state['epoch_count'] 

     with open(self.state_path, 'w') as f: 
      json.dump(self.state, f, indent=4) 
     print('Completed epoch', self.state['epoch_count']) 

     self.state['epoch_count'] += 1 

然后,在合适的()函数,是这样的:

# Set up the model state, reading in prior results info if available 

model_state = ModelState(path_to_state_file) 

# Checkpoint the model if we get a better result 

model_checkpoint = callbacks.ModelCheckpoint(path_to_model_file, 
              monitor='val_loss', 
              save_best_only=True, 
              verbose=1, 
              mode='min', 
              save_weights_only=False) 


# If we have trained previously, set up the model checkpoint so it won't save 
# until it finds something better. Otherwise, it would always save the results 
# of the first epoch. 

if 'best_values' in model_state.state: 
    model_checkpoint.best = model_state.state['best_values']['val_loss'] 

callback_list = [model_checkpoint, 
       model_state] 

# Offset epoch counts if we are resuming training. If you don't do 
# this, only epochs-initial_epochs epochs will be done. 

initial_epoch = model_state.state['epoch_count'] 
epochs += initial_epoch 

# .fit() or .fit_generator, etc. goes here. 
2

我不认为,你必须自己存储度量值。在keras项目上有一个feature-request关于非常相似的东西,但它已经关闭。也许你可以尝试使用那里已经出现的解决方案。在keras的理念中,存储度量标准并不是非常有用,因为您只是保存了model这意味着:每个图层的体系结构和权重;不是历史或其他任何东西。

最简单的方法是创建一种metafile,其中包含模型的度量值和模型本身的名称。然后,您可以加载metafile,获得最佳度量值并获取产生它们的模型的名称,再次加载模型,以恢复培训。

+1

感谢您指出功能请求。我最终做了类似的,下面的代码。 – MadOverlord