我正在尝试使用h2o
针对数据集的不同部分针对2种算法(random forest
和gbm
)运行优化网格。我的代码看起来像R H2O连接(内存)问题
for (...)
{
read data
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
gbm.grid <- h2o.grid("gbm", grid_id = "gbm.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
# shutdown h2o
h2o.shutdown(prompt = FALSE)
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
rf.grid <- h2o.grid("randomForest", grid_id = "rf.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.shutdown(prompt = FALSE)
}
的问题是,如果我运行在一个for loop
去,我得到的错误
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, :
Unexpected CURL error: Failed to connect to localhost port 54321: Connection refused
PS:我使用的是线
# shutdown h2o
h2o.shutdown(prompt = FALSE)
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
所以我“重置”h2o
,以便我没有用完内存
我也读了R H2O - Memory management但我不清楚它是如何工作的。
UPDATE
以下Matteusz评论后,我init
的for loop
外部和for loop
里面我用h2o.removeAll()
。所以,现在我的代码看起来像这样
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
for(...)
{
read data
gbm.grid <- h2o.grid("gbm", grid_id = "gbm.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.removeAll()
rf.grid <- h2o.grid("randomForest", grid_id = "rf.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.removeAll() }
看来工作,但现在我得到这个错误(?)在grid optimization
为random forest
任何想法,这可能是?
所以我应该把'init'放在类似'while(h2o.clusterIsUp())'的东西里面? – quant
你应该首先在while循环内运行'h2o.clusterIsUp())'(最好在循环内使用'sleep'),然后在循环之后运行'h2o.init'。但是正如我所提到的那样是浪费,你不需要每次启动/停止节点。 –
请参阅更新 – quant