与橙色python库交叉验证

我尝试使用python包“橙色”进行交叉验证。这个库看起来很不错，但是我有一些问题。与橙色python库交叉验证

对于版本信息，我使用Python 2.7和Orange 2.7.8。我的任务很简单。我想验证模型与交叉验证与（1）离散化的数字属性和（2）功能选择。如你所知，在这里，在交叉验证循环中处理离散化和特征选择是很重要的。换句话说，对于每个交叉验证循环，（1）只对训练数据进行离散化处理，对测试数据使用相同的舍入切割，（2）从训练数据中获取重要特征，并将特征用于测试数据。

在学习了橙色之后，我编码了下面的脚本。

import Orange, orange, orngDisc, orngTest, orngStat, orngFSS 

data = Orange.data.Table("test.tab") # has numeric, discrete features 

nb = Orange.classification.bayes.NaiveLearner() 
dBayes = orngDisc.DiscretizedLearner(nb, method=Orange.feature.discretization.Entropy(), name="disc nb") 

# feature selection (three important features based on information gain) 
fss = orngFSS.FilterBestN(n=3, measure=Orange.feature.scoring.InfoGain()) 
fBayes = orngFSS.FilteredLearner(dBayes, filter=fss, name="nb & fss") 

learners = [nb, dBayes, fBayes] 
results = orngTest.crossValidation(learners, data, folds=10, storeClassifiers=1, storeExamples=1) 

# print accuracy for the three models (no errors in this block!) 
print "\nLearner   Accuracy #Atts" 
for i in range(len(learners)): 
    print "%-15s %5.3f  %5.2f" % (learners[i].name, orngStat.CA(results)[i], natt[i])

总之，数据集（“数据”中的代码）包含数值和离散特征，以及我想要做的离散化（基于熵），然后特征选择内（顶部3基于该信息增益特性）交叉验证过程。

但是，错误表示在计算数字特征的信息增益时发生错误。我认为特征选择是在离散化之前处理的。我认为一些小的修改是必要的，但在网络上没有很多关于Orange的例子...并且我对修改没有明显的想法。

你能给我点修改吗？谢谢。

来源

2016-12-07 Minsoo Choy

恐怕您不能使用orngFSS.FilterBestN(n=3, measure=Orange.feature.scoring.InfoGain())，因为某些功能是连续的。方法“feature.scoring.InfoGain”将检查这些特征是否具有独立性，参考here。

我有两个建议：

利用分类树的学习方法，并选择在树的顶部三个特点。如果特征是连续的，那么经典树将使用诸如“A> 0.1”的判别式使特征离散化。
手动使功能离散。例如，如果年龄是特征，则将其标记为'D'，并且橙色将认为该特征是离散的。我认为它会工作

来源

2017-01-10 14:33:21 Zealseeker

与橙色python库交叉验证

回答

相关问题