2016-01-28 46 views
0

使用Matlab时,从交叉验证拟合中找到误差最小的模型的正确方法是什么?我的目标是展现最好的,交叉验证决策树的错误率作为测试数据的大小的功能,并有下面的代码:在Matlab中交叉验证返回最佳决策树

chess = csvread(filename); 
predictors = chess(:,1:6); 
class = chess(:,7); 

cvpart = cvpartition(class,'holdout', 0.3); 
Xtrain = predictors(training(cvpart),:); 
Ytrain = class(training(cvpart),:); 
Xtest = predictors(test(cvpart),:); 
Ytest = class(test(cvpart),:); 

numElements = numel(training(cvpart)); 
trainErrorGrowing = zeros(numElements,1); 
testErrorGrowing = zeros(numElements,1); 

for n = 100:numElements 
    data = datasample(training(cvpart), n); 
    dataX = predictors(data,:); 
    dataY = class(data,:); 

    % Fit the decision tree 
    tree = fitctree(dataX, dataY, 'AlgorithmForCategorical', 'PullLeft', 'CrossVal', 'on'); 

    % Loop to find the model with the least error 
    kfoldError = 100; 
    bestTree = tree.Trained{1}; 
    for i = 1:10 
     err = loss(tree.Trained{i}, Xtrain, Ytrain); 
     if err < kfoldError 
      kfoldError = err; 
      bestTree = tree.Trained{i}; 
     end 
    end 
    trainErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Training Error 
    testErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Testing Error 
end 

plot(numElements,testErrorGrowing); 

这是与用于数据的指标很重要不能以任何方式使用最终测试来训练树。然而,当我尝试执行这段代码,我上线

err = loss(tree.Trained{i}, Xtrain, Ytrain); 

我试图铸造在INT8和炭迭代器错误

Error using classreg.learning.internal.classCount 
You passed an unknown class '1' of type double. 

,但都收到同样的错误倍。是否有一种更简单的方法来查找出错结果最小的决策树,或者至少有一种方法可以引用单个受过训练的树?

回答

0

假设您在学习模型时进行10倍交叉验证。然后,您可以使用kfoldLoss功能也得到每个倍CV的损失,然后选择训练的模型,让您通过以下方式将至少CV损失:

modelLosses = kfoldLoss(tree,'mode','individual'); 

上面的代码将会给你的向量如果您在学习期间完成了10倍交叉验证,则长度为10(10个CV错误值)。假设具有最小CV误差的经过训练的模型是第k个,那么您将使用:

testSetPredictions = predict(tree.Trained{k}, testSetFeatures);