randomforest预测函数中的下标越界误差

我使用随机森林进行预测，并且在predict(fit, test_feature)行中出现以下错误。有人可以帮助我克服这一点。我对另一个数据集做了相同的步骤，没有发生错误。但我在这里得到错误。randomforest预测函数中的下标越界误差

Error: Error in x[, vname, drop = FALSE] : subscript out of bounds 

training_index <- createDataPartition(shufflled[,487], p = 0.8, times = 1) 
training_index <- unlist(training_index) 

train_set <- shufflled[training_index,] 
test_set <- shufflled[-training_index,] 

accuracies<- c() 
k=10 
n= floor(nrow(train_set)/k) 

for(i in 1:k){ 
    sub1<- ((i-1)*n+1) 
    sub2<- (i*n) 
    subset<- sub1:sub2 
    train<- train_set[-subset, ] 
    test<- train_set[subset, ] 
    test_feature<- test[ ,-487] 

    True_Label<- as.factor(test[ ,487]) 
    fit<- randomForest(x= train[ ,-487], y= as.factor(train[ ,487])) 

    prediction<- predict(fit, test_feature) #The error line 
    correctlabel<- prediction == True_Label 
    t<- table(prediction, True_Label) 
}

来源

2017-07-15 Najme Rastegar

你的问题不是很清楚，无论如何，我尽力帮助你。首先检查您的数据，以查看各种预测变量和结果的分布。您可能会发现某些预测因子水平或结果水平非常偏高，或者某些结果或预测因子水平非常罕见。当我试图用经过严格调整的随机森林来预测一个非常罕见的结果时，我得到了这个错误，所以一些预测因子水平实际上并不在训练数据中。因此，训练数据认为超出界限的测试数据中会出现一个因素级别。

或者，检查变量的名称。在调用predict（）以确保变量名称匹配之前。没有你的数据文件，很难说出你的第一个例子的工作原因。例如，您可以尝试：

names(test) <- names(train)

来源

2017-07-15 15:51:29

我已经测试过我的火车和测试数据集。另外，当我为测试和测试数据运行独特（标签）时，我会得到相同的标签。我也检查了变量的名字，但仍然是相同的错误 –

几周前我也遇到过类似的问题。

去解决这个问题，你可以这样做：

df$label <- factor(df$label)

相反as.factor的尝试只因子通用功能。另外，请先尝试命名标签变量。

来源

2017-07-15 16:48:52 Loncar

randomforest预测函数中的下标越界误差

回答

相关问题