如何根据R中的决策树模型测试数据？

我使用r中的rpart软件包从培训数据构建了一个决策树。现在我有更多的数据，并且我想根据树来检查它以检查模型。逻辑上/迭代地，我想要做到以下几点：如何根据R中的决策树模型测试数据？

for each datapoint in new data 
    run point thru decision tree, branching as appropriate 
    examine how tree classifies the data point 
    determine if the datapoint is a true positive or false positive

我该如何做R？

来源

2013-10-27 bernie2436

使用'预测（）'函数：http://stat.ethz.ch/R-manual/R -devel/library/rpart/html/predict.rpart.html – David

为了能够使用它，我假设你将你的训练集分成一个子集训练集和一个测试集。

要创建可以使用的人才培养模式：

model <- rpart(y~., traindata, minbucket=5) # I suspect you did it so far.

将它应用到测试集：

pred <- predict(model, testdata)

然后你得到预测结果的向量。

在你的训练测试数据集中，你也有“真实”的答案。假设训练集中的最后一列。

简单地等同他们将产生的结果是：

pred == testdata[ , last] # where 'last' equals the index of 'y'

当元素相等，你会得到一个真正的，当你得到一个FALSE，它意味着你的预测是错误的。

pred + testdata[, last] > 1 # gives TRUE positive, as it means both vectors are 1 
pred == testdata[, last] # gives those that are correct

这可能是有趣的，看看你有多少百分比有正确的：

mean(pred == testdata[ , last]) # here TRUE will count as a 1, and FALSE as 0

来源

2013-10-27 16:58:53 PascalVKooten

由于写了这个答案，'rpart'库大概已经改变了。我不得不用下面的方法使它工作：'pred < - 预测（model，newdata = testdata，type ='class'）'（否则你得到一个完整的概率矩阵）。 – kynan

如何根据R中的决策树模型测试数据？

回答

相关问题