2016-05-12 30 views
0

xgboost xgb.dump tree coefficient提问。如何使用xgboost R树转储来计算或执行预测?

我特别想知道如果eta = 0.1或0.01概率计算与提供的答案有何不同?

我想使用树转储进行预测。

我的代码是

#Define train label and feature frames/matrix 
y <- train_data$esc_ind 
train_data = as.matrix(train_data) 
trainX <- as.matrix(train_data[,-1]) 
param <- list("objective" = "binary:logistic", 
       "eval_metric" = "logloss", 
       "eta" = 0.5, 
       "max_depth" = 2, 
       "colsample_bytree" = .8, 
       "subsample" = 0.8, #0.75 
       "alpha" = 1 

) 

#Train XGBoost 
bst = xgboost(param=param, data = trainX, label = y, nrounds=2) 

trainX1 = data.frame(trainX) 
mpg.fmap = genFMap(trainX1, "xgboost.fmap") 
xgb.save(bst, "xgboost.model") 
xgb.dump(bst, "xgboost.model_6.txt",with.stats = TRUE, fmap = "xgboost.fmap") 

树的样子:

booster[0] 
0:[order.1<12.2496] yes=1,no=2,missing=2,gain=1359.61,cover=7215.25 
    1:[access.1<0.196687] yes=3,no=4,missing=4,gain=3.19685,cover=103.25 
     3:leaf=-0,cover=1 
     4:leaf=0.898305,cover=102.25 
    2:[team<6.46722] yes=5,no=6,missing=6,gain=753.317,cover=7112 
     5:leaf=0.893333,cover=55.25 
     6:leaf=-0.943396,cover=7056.75 
booster[1] 
0:[issu.1<6.4512] yes=1,no=2,missing=2,gain=794.308,cover=5836.81 
    1:[team<3.23361] yes=3,no=4,missing=4,gain=18.6294,cover=67.9586 
     3:leaf=0.609363,cover=21.4575 
     4:leaf=1.28181,cover=46.5012 
    2:[case<6.74709] yes=5,no=6,missing=6,gain=508.34,cover=5768.85 
     5:leaf=1.15253,cover=39.2126 
     6:leaf=-0.629773,cover=5729.64 

将为所有树叶分数xgboost系数为1时ETA选择小于1?

+0

请检查我的答案在下面的链接 - 可能会有用 - http://stackoverflow.com/questions/39858916/xgboost-how-to-get-probabilities-of-class-from-xgb-dump-multisoftprob- objecti/40632862#40632862 – Run2

回答

0

其实这是我早些时候监督的实用。

使用上面的树结构可以找到每个训练样例的概率。

参数列表是:

param <- list("objective" = "binary:logistic", 
       "eval_metric" = "logloss", 
       "eta" = 0.5, 
       "max_depth" = 2, 
       "colsample_bytree" = .8, 
       "subsample" = 0.8, 
       "alpha" = 1) 

对于例如在叶助力设定[0],叶:0-3;概率将是exp(-0)/(1 + exp(-0))。

而对于助推器[0],叶子:0-3 +助推器[1],叶子:0-3;概率将是exp(0 + 0.609363)/(1 + exp(0 + 0.609363))。

等等,随着越来越多的迭代。

我将这些值与R的预测概率相匹配,它们在10 ^( - 7)之间有所不同,这可能是由于叶子质量分数的浮点缩减。

当R的训练增强树被用于不同环境中进行预测时,此答案可以给出生产级解决方案。

对此的任何评论将不胜感激。