2014-07-17 16 views
9

我已经用R符号训练了树模型。现在我想产生混淆矩阵和不断收到以下错误:产生混淆矩阵时,会发生ConfusionMatrix中的错误数据和参考因子必须具有相同的层数R CARET

Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels

prob <- 0.5 #Specify class split 
singleSplit <- createDataPartition(modellingData2$category, p=prob, 
            times=1, list=FALSE) 
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5) 
traindata <- modellingData2[singleSplit,] 
testdata <- modellingData2[-singleSplit,] 
treeFit <- train(traindata$category~., data=traindata, 
       trControl=cvControl, method="rpart", tuneLength=10) 
predictionsTree <- predict(treeFit, testdata) 
confusionMatrix(predictionsTree, testdata$catgeory) 

错误。两个对象的级别相同。我无法弄清楚问题所在。他们的结构和水平如下。 他们应该是一样的。任何帮助将不胜感激,因为它使我破解!

> str(predictionsTree) 
Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ... 
> str(testdata$category) 
Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ... 

> levels(predictionsTree) 
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee"   "18-Gov. Stamp Duty"   "Misc"       "26-Standard Transfer Charge" 
[6] "29-Bank Giro Credit"   "3-Cheques Debit"    "32-Standing Order - Debit" "33-Inter Branch Payment"  "34-International"    
[11] "35-Point of Sale"    "39-Direct Debits Received" "4-Notified Bank Fees"   "40-Cash Lodged"    "42-International Receipts" 
[16] "46-Direct Debits Paid"  "56-Credit Card Receipts"  "57-Inter Branch"    "58-Unpaid Items"    "59-Inter Company Transfers" 
[21] "6-Notified Interest Credited" "61-Domestic"     "64-Charge Refund"    "66-Inter Company Transfers" "67-Suppliers"     
[26] "68-Payroll"     "69-Domestic"     "73-Credit Card Payments"  "82-CHAPS Fee"     "Uncategorised" 

> levels(testdata$category) 
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee"   "18-Gov. Stamp Duty"   "Misc"       "26-Standard Transfer Charge" 
[6] "29-Bank Giro Credit"   "3-Cheques Debit"    "32-Standing Order - Debit" "33-Inter Branch Payment"  "34-International"    
[11] "35-Point of Sale"    "39-Direct Debits Received" "4-Notified Bank Fees"   "40-Cash Lodged"    "42-International Receipts" 
[16] "46-Direct Debits Paid"  "56-Credit Card Receipts"  "57-Inter Branch"    "58-Unpaid Items"    "59-Inter Company Transfers" 
[21] "6-Notified Interest Credited" "61-Domestic"     "64-Charge Refund"    "66-Inter Company Transfers" "67-Suppliers"     
[26] "68-Payroll"     "69-Domestic"     "73-Credit Card Payments"  "82-CHAPS Fee"     "Uncategorised"  
+0

在你的错误中,'category'拼写为'catgeory'。如果问题不相关,那么'identical(levels(predictionsTree),levels(testdata $ category))'的输出是什么? – fxi

+0

嗨,谢谢你,我赞扬愚蠢的拼写错误.... doh!我运行了相同的功能,它输出[1] TRUE .........现在我遇到以下错误,当我运行confusionMatrix函数.....表中的错误(数据,参考,dnn = dnn,...): 所有参数必须具有相同的长度 – user2987739

+0

检查另一个拼写错误的'catgeory',检查'length(testdata $ category)'和'length(predictionsTree'),并检查两个向量的总结。只需要一个简单的混淆矩阵:'table(predictionsTree,testdata $ category)' – fxi

回答

1

也许你的模型没有预测到某个因素。 使用table()函数而不是confusionMatrix()来查看是否有问题。

+1

您可以将其添加为注释。 –

-2

可能是测试数据中缺少值,请在“predictionsTree < - predict(treeFit,testdata)”之前添加以下行以删除NA。我有同样的错误,现在它适用于我。

testdata <- testdata[complete.cases(testdata),] 
0

你正在运行到长度问题可能是由于到NAS的训练集中存在 - 要么丢弃不完整的情况下,或归罪于让你没有缺失值。

0

尝试指定na.passna.action选项:

predictionsTree <- predict(treeFit, testdata,na.action = na.pass) 
0

我有同样的问题,而是继续和读取,像这样的数据文件后,改变了它..

data = na.omit(data)

感谢所有为指针!

相关问题