2017-06-13 49 views
1

在R中有预先构建的函数来绘制随机森林模型的特征重要性。但在python中,这种方法似乎缺失。我在matplotlib中搜索了一种方法。matplotlib:绘图特征重要性与功能名称

model.feature_importances给了我以下:

array([ 2.32421835e-03, 7.21472336e-04, 2.70491223e-03, 
     3.34521084e-03, 4.19443238e-03, 1.50108737e-03, 
     3.29160540e-03, 4.82320256e-01, 3.14117333e-03]) 

然后使用下列绘图功能:

>> pyplot.bar(range(len(model.feature_importances_)), model.feature_importances_) 
>> pyplot.show() 

我得到一个barplot,但我想获得barplot有标签的,而重要性排序的方式水平呈现。我也在探索seaborn,但无法找到方法。

+1

您正在寻找'barh'(水平条形图)。将功能名称传递给'tick_label'。 – DyZ

回答

4

不完全确定你在找什么。从here派生出一个例子。如评论中所述:如果要自定义功能标签,可以将indices更改为行plt.yticks(range(X.shape[1]), indices)处的标签列表。

import numpy as np 
import matplotlib.pyplot as plt 

from sklearn.datasets import make_classification 
from sklearn.ensemble import ExtraTreesClassifier 

# Build a classification task using 3 informative features 
X, y = make_classification(n_samples=1000, 
          n_features=10, 
          n_informative=3, 
          n_redundant=0, 
          n_repeated=0, 
          n_classes=2, 
          random_state=0, 
          shuffle=False) 

# Build a forest and compute the feature importances 
forest = ExtraTreesClassifier(n_estimators=250, 
           random_state=0) 

forest.fit(X, y) 
importances = forest.feature_importances_ 
std = np.std([tree.feature_importances_ for tree in forest.estimators_], 
      axis=0) 
indices = np.argsort(importances) 

# Plot the feature importances of the forest 
plt.figure() 
plt.title("Feature importances") 
plt.barh(range(X.shape[1]), importances[indices], 
     color="r", xerr=std[indices], align="center") 
# If you want to define your own labels, 
# change indices to a list of labels on the following line. 
plt.yticks(range(X.shape[1]), indices) 
plt.ylim([-1, X.shape[1]]) 
plt.show() 

enter image description here

0

这是相当容易使用大熊猫情节()方法。

feat_importances = pd.Series(model.feature_importances_, index=X.columns) 
feat_importances = feat_importances.nlargest(20) 
feat_importances.plot(kind='barh') 

这种方法显示在图上的y轴的变量名称为好,免费:我们绘制从我们sklearn随机森林模型中的前20名最重要的特点,用X训练。