0
我尝试在特征选择中定义变量名称。我有这样如何在特征选择中定义变量名称
import pandas as pd
df = pd.DataFrame ({'a' : [1, 0,1, 0,1, 0,1, 0,1, 0 ],
'b' : ['foo', 'bar','foo', 'bar','foo', 'bar','foo', 'bar','foo', 'bar' ] ,
'c' : ['foo', 'bar','bar','foo','foo', 'bar','bar','foo','foo', 'bar' ],
'd' :['d','d','b','a','d','d','a','b','d','a'] })
一个DataSet,以便
X, y = df.ix[:, 1:], df.ix[:,[0]]
X_dummy = pd.get_dummies(X)
而且
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
X_new = SelectKBest(chi2, k=4).fit_transform(X_dummy, y)
X_new
array([[0, 1, 0, 1],
[1, 0, 0, 1],
[0, 1, 0, 0],
[1, 0, 1, 0],
[0, 1, 0, 1],
[1, 0, 0, 1],
[0, 1, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 1],
[1, 0, 1, 0]], dtype=uint8)
我得到的数组,但我想知道什么是变量(b
,c
或d
或他们的虚拟期权)必须在模型中包含。如何找出这个?谢谢!