h2o max_runtime_seconds - 似乎没有任何影响？

我想利用max_runtime_seconds，但要么很难理解这应该是如何工作，或者我觉得更可能 - 有某种错误。h2o max_runtime_seconds - 似乎没有任何影响？

我一直在测试随机森林，它似乎从来没有减少运行时间。

import h2o 
h2o.init() 
from h2o.estimators import H2ORandomForestEstimator 

df=h2o.import_file('covtype.csv') #### https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/ 
for i in df.names: 
    df[i]=df[i].asfactor() 
df.types ## just showing everything is categorical 


train,test = df.split_frame(ratios=[0.75], seed = 2017) 

response = 'C55' 
xvars = train.drop(["C55"]).col_names 


mymodel = H2ORandomForestEstimator(
nfolds = 10, 
max_runtime_secs = 30, 
    stopping_rounds = 5, 
    ntrees = 500 
) 

mymodel.train(
x = xvars, 
y = response, 
validation_frame = test, 
training_frame = train) 
## does not finish remotely close to <30 seconds 
mymodel.actual_params()

注意，最大运行时间参数似乎没有被保存在0 保持我使用H2O的“前沿”版现在〜3.13和蟒蛇。

来源

2017-07-17 jack

我的猜测是'max_runtime_secs'指的是为每个树运行每个树的最大分配秒数。因此，如果你有'ntrees = 100'，那么建立这个模型的最长时间是100棵树x 90秒x 5倍，或45000秒。 –

如果您将发布完全可重现的示例，则更有可能有人会尝试帮助您调试问题：https://stackoverflow.com/help/mcve您甚至可以从文档中复制该文档：http：// docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/max_runtime_secs.html同时添加一些代码来计时使用timeit模块或类似的培训将有助于证明这实际上是一个错误或不。 –

@ Scratch'N'Purr这是一个很好的猜测，但不是，文档声明它是整个模型（不是每个树）的最大运行时间。所以'max_runtime_secs'变量应该在上面的代码中强制执行一个90秒的时间限制。 –

我已经确认它是Python API的错误（max_runtime_secs代码在后端和R客户端上工作）。我打开了一张票here，我希望这将在下一个版本中得到解决。

来源

2017-07-18 21:26:26

Ok Erin，谢谢我打了几秒钟。我用一个使用封面类型数据集的例子更新了上面的内容。谢谢！ – jack

h2o max_runtime_seconds - 似乎没有任何影响？

回答

相关问题