2017-07-17 17 views
0

我已经检查了一种方法,在使用tree.decisionTreeClassifier运行决策树之后获取最相关的功能,但它不是成功的。在他们讨论的关于请求“feature_importances”的以下链接中。但是,这不被认为是tree.DecisionTreeClassifier的一个属性。单独的模块DecisioTreeClassifier无法找到。有人可以帮助我完成这项任务吗?feature_importance不被识别为树中的输出.DecisionTreeClassifier

How to interpret decision trees' graph results and find most informative features?

回答

0

我最近找到一个解决方案。 这里是我的代码部分:

seed = 7 
DTC = DecisionTreeClassifier 
parameters = {'max_depth':range(3,10), 'max_leaf_nodes':range(10, 30), 'criterion': ['gini'], "splitter" : ["best"]}#, 'max_features':range(10,100)} 
dt = RandomizedSearchCV(DTC(random_state=seed), parameters, n_jobs=10, cv=kfold) #min_samples_leaf=10 
fit_dt= dt.fit(X_train, Y_train) 
print(dir(fit_dt)) 
tree_model = dt.best_estimator_ 
print (dt.best_score_, dt.best_params_, dt.error_score) #, dt.cv_results_) 
print('best estimators') 
print(fit_dt.best_estimator_) 

features = tree_model.feature_importances_ 
print(features) 

rank = np.argsort(features)[::-1] 
print(rank[:12]) 
print(sorted(list(zip(features)))) 
#for items in fit_dt.feature_importances_: 
# print (items) 

# Print best scores and best parameters 

means = dt.cv_results_['mean_test_score'] 
stds = dt.cv_results_['std_test_score'] 
for mean, std, params in zip(means, stds, dt.cv_results_['params']): 
    print("%0.3f (+/-%0.03f) for %r" 
      % (mean, std * 2, params)) 

print('Best Score: {}' 
     .format(dt.best_score_)) 
print('Best params: {}' 
     .format(dt.best_params_)) 

print('Accuracy of DT classifier on training set: {:.2f}' 
    .format(dt.score(X_train, Y_train))) 
print('Accuracy of DT classifier on test set: {:.2f}' 
    .format(dt.score(X_test, Y_test))) 

predictions = dt.predict(X_test) 
print(np.column_stack((Y_test, np.round(predictions)))) 

检查出来,如果你可以把它复制到你的数据。