ConfusionMatrix中的错误数据和参考因子必须具有相同的层数R CARET

我已经用R符号训练了树模型。现在我想产生混淆矩阵和不断收到以下错误：产生混淆矩阵时，会发生ConfusionMatrix中的错误数据和参考因子必须具有相同的层数R CARET

Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels

prob <- 0.5 #Specify class split 
singleSplit <- createDataPartition(modellingData2$category, p=prob, 
            times=1, list=FALSE) 
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5) 
traindata <- modellingData2[singleSplit,] 
testdata <- modellingData2[-singleSplit,] 
treeFit <- train(traindata$category~., data=traindata, 
       trControl=cvControl, method="rpart", tuneLength=10) 
predictionsTree <- predict(treeFit, testdata) 
confusionMatrix(predictionsTree, testdata$catgeory)

错误。两个对象的级别相同。我无法弄清楚问题所在。他们的结构和水平如下。他们应该是一样的。任何帮助将不胜感激，因为它使我破解！

> str(predictionsTree) 
Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ... 
> str(testdata$category) 
Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ... 

> levels(predictionsTree) 
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee"   "18-Gov. Stamp Duty"   "Misc"       "26-Standard Transfer Charge" 
[6] "29-Bank Giro Credit"   "3-Cheques Debit"    "32-Standing Order - Debit" "33-Inter Branch Payment"  "34-International"    
[11] "35-Point of Sale"    "39-Direct Debits Received" "4-Notified Bank Fees"   "40-Cash Lodged"    "42-International Receipts" 
[16] "46-Direct Debits Paid"  "56-Credit Card Receipts"  "57-Inter Branch"    "58-Unpaid Items"    "59-Inter Company Transfers" 
[21] "6-Notified Interest Credited" "61-Domestic"     "64-Charge Refund"    "66-Inter Company Transfers" "67-Suppliers"     
[26] "68-Payroll"     "69-Domestic"     "73-Credit Card Payments"  "82-CHAPS Fee"     "Uncategorised" 

> levels(testdata$category) 
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee"   "18-Gov. Stamp Duty"   "Misc"       "26-Standard Transfer Charge" 
[6] "29-Bank Giro Credit"   "3-Cheques Debit"    "32-Standing Order - Debit" "33-Inter Branch Payment"  "34-International"    
[11] "35-Point of Sale"    "39-Direct Debits Received" "4-Notified Bank Fees"   "40-Cash Lodged"    "42-International Receipts" 
[16] "46-Direct Debits Paid"  "56-Credit Card Receipts"  "57-Inter Branch"    "58-Unpaid Items"    "59-Inter Company Transfers" 
[21] "6-Notified Interest Credited" "61-Domestic"     "64-Charge Refund"    "66-Inter Company Transfers" "67-Suppliers"     
[26] "68-Payroll"     "69-Domestic"     "73-Credit Card Payments"  "82-CHAPS Fee"     "Uncategorised"

来源

2014-07-17 user2987739

在你的错误中，'category'拼写为'catgeory'。如果问题不相关，那么'identical（levels（predictionsTree），levels（testdata $ category））'的输出是什么？ – fxi

嗨，谢谢你，我赞扬愚蠢的拼写错误.... doh！我运行了相同的功能，它输出[1] TRUE .........现在我遇到以下错误，当我运行confusionMatrix函数.....表中的错误（数据，参考，dnn = dnn，...）：所有参数必须具有相同的长度 – user2987739

检查另一个拼写错误的'catgeory'，检查'length（testdata $ category）'和'length（predictionsTree'），并检查两个向量的总结。只需要一个简单的混淆矩阵：'table（predictionsTree，testdata $ category）' – fxi

也许你的模型没有预测到某个因素。使用table（）函数而不是confusionMatrix（）来查看是否有问题。

来源

2014-10-31 05:36:44 Red

您可以将其添加为注释。 –

-2

可能是测试数据中缺少值，请在“predictionsTree < - predict（treeFit，testdata）”之前添加以下行以删除NA。我有同样的错误，现在它适用于我。

testdata <- testdata[complete.cases(testdata),]

来源

2015-01-11 07:12:01 EaswerC

你正在运行到长度问题可能是由于到NAS的训练集中存在 - 要么丢弃不完整的情况下，或归罪于让你没有缺失值。

来源

2015-05-21 21:06:38 orange1

尝试指定na.pass为na.action选项：

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

来源

2015-11-12 03:02:11 aristotll

我有同样的问题，而是继续和读取，像这样的数据文件后，改变了它..

data = na.omit(data)

感谢所有为指针！

来源

2015-11-21 18:54:00 Alicia

ConfusionMatrix中的错误数据和参考因子必须具有相同的层数R CARET

回答

相关问题