2012-06-11 40 views
0

我能够使用我给出的一些R代码成功运行射频模型。这是下面,它也包括我的数据片段。将概率向量添加回原始数据帧中R

唯一的问题是,代码写入的方式只输出一个概率向量,没有来自原始测试数据集的数据称为“测试集”。所以现在我想弄清楚如何输出我的概率以及原始数据帧,因为我找不到在线解决方案。换句话说,我希望它成为数据集中的另一列,就像在我的FLSAStat列之后。这样我就可以把它全部输出到一个CSV文件中。

这是我有:

##################################################### 
# 1. SETUP DATA 
##################################################### 
mydata <- read.csv("train_test.csv", header=TRUE) 
colnames(testset) 
[1] "train"   "Target"   "ApptCode"  "Directorate"   "New_Discipline" "Series"   "Adjusted.Age" 
[8] "Adj.Service"  "Adj.Age.Service" "HiEducLv"  "Gender"   "RetCd"   "FLSAStat" 
> head(testset) 
train Target ApptCode    Directorate    New_Discipline Series Adjusted.Age Adj.Service Adj.Age.Service HiEducLv Gender 
5909  0  NA  IN     Business Math Computer Science IT  PSTS  54.44   10   64.44 Bachelor Male 
5910  0  NA  IN    Computation Math Computer Science IT PSTS  51.51   15   66.51 Bachelor Male 
5911  0  NA  IN Physical and Life Sciences     Physics PSTS  40.45   5   45.45  PHD Male 
5912  0  NA  IN Weapons and Complex Integ     Physics PSTS  62.21   35   97.21  PHD Male 
5913  0  NA  IN Weapons and Complex Integ     Physics PSTS  45.65   15   60.65  PHD Male 
5914  0  NA  FX Physical and Life Sciences     Physics PSTS  36.13   5   41.12  PHD Male 
    RetCd FLSAStat 
5909 TCP2  E 
5910 TCP2  E 
5911 TCP2  E 
5912 TCP2  E 
5913 TCP1  E 
5914 TCP2  E  

#create train and test sets 
trainset = mydata[mydata$train == 1,] 
testset = mydata[mydata$train == 0,] 
#eliminate unwanted columns from train set 
trainset$train = NULL 
##################################################### 
# 2. set the formula 
##################################################### 
theTarget <- "Target" 
theFormula <- as.formula(paste("as.factor(",theTarget, ") ~ . ")) 
theFormula1 <- as.formula(paste(theTarget," ~ . ")) 
trainTarget = trainset[,which(names(trainset)==theTarget)] 
testTarget = testset[,which(names(testset)==theTarget)] 

##################################################### 
# Random Forest 
##################################################### 
library(randomForest) 
what <- "Random Forest" 
FOREST_model <- randomForest(theFormula, data=trainset, ntree=500) 
train_pred <- predict(FOREST_model, trainset, type="prob")[,2] 
test_pred <- predict(FOREST_model, testset, type="prob")[,2] 
display_results() 
testID <- testset$case_id 
predictions <- test_pred 
submit_file = cbind(testID,predictions) 
write.csv(submit_file, file="RANDOM4.csv", row.names = FALSE) 

我认为这个问题是我缺乏额外的代码行,做预测矢量回测试集的合并。我猜测这会在第三行到最后一行代码之前出现。

+2

嗨!你介意审查[这个问题](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)并修改你的问题?如果您可以提供一个可重现的起始数据集示例,模型输出的概率以及这两种东西应该如何连接在一起,那么我们将更容易帮助您。代替或除此之外,请查看'cbind()','rbind()','merge()'或'match()'来做你需要做的事情...这些前两个按行或列组合对象,而最后两个对象大致等同于SQL连接 – Chase

回答

0

列只需添加到您的数据帧像这样:

testset$Predictions <- test_pred 
write.csv(testset, file="RANDOM4.csv", row.names = FALSE)