为什么使用scikit-learn的GradientBoostingRegressor从相同的输入中获得不同的输出？

例如：为什么使用scikit-learn的GradientBoostingRegressor从相同的输入中获得不同的输出？

params = {'n_estimators': 200, "max_depth": 4, 'subsample': 1, 'learning_rate': 0.1} 
boost = ensemble.GradientBoostingRegressor(**params) 
ghostBoost = ensemble.GradientBoostingRegressor(**params) 

... 

boost.fit(x, y) 
ghostBoost.fit(x, y) 

... 

predictionA = boost.predict(features) 
predictionB = ghostBoost.predict(features)

boost和ghostBoost是精确的相同，但predictionA不等于predictionB，为什么会出现这种情况？

来源

2014-02-18 Shane

尝试将两个模型的random_state构造函数参数修复为相同的值。由于每个节点认为max_features随机抽取（~~，替换~~无替换），所以决策树构建过程是随机的。

编辑：特征采样完成后无需替换。当max_features=None（默认）评估所有功能时，但是当max_depth不是None时可能会产生影响的排序更改，并且目标变量具有导致绑定最佳功能拆分的非唯一值。

来源

2014-02-18 07:52:38 ogrisel

非常感谢！我强制'random_state' = 1，现在所有结果都是一致的，这会以任何方式影响性能吗？ – Shane

不，它不应该影响性能。 – ogrisel

我刚刚注意到当输入样本的顺序发生变化时，结果应该是相同的，实际上是不同的，请您也可以在http://stackoverflow.com/questions/22170677/how-什么时候输入样本是变化的？ – Shane

为什么使用scikit-learn的GradientBoostingRegressor从相同的输入中获得不同的输出？

回答

相关问题