如何优化R中多个预测模型的准确性代码？

我有我想要计算模型的功能，并在交叉验证我打电话，所以我会得到一个名为results与类或标签数据帧，针对每个预测为每次迭代：

head(results) 
    iteration class ksvm rf 
65   1  4 4 4 
306   1  2 2 2 
300   1  4 4 4 
385   1  2 2 2 
431   1  2 2 2 
205   1  4 4 4

（该索引可以忽略，因为它来自被采样的数据）。

由于我有一个5倍交叉验证，我有5个迭代预测在这种情况下ksvm和。（这些被存储在一个名为algorithms变量

这个我计算的准确性这种方式后：

results %>% 
    group_by(iteration) %>% 
    summarise(acc_ksvm = sum(ksvm == class)/n() , acc_rf = sum(rf == class)/n())

输出：

iteration acc_ksvm acc_rf 
     (int)  (dbl)  (dbl) 
1   1 0.9603175 0.9603175 
2   2 0.9760000 0.9680000 
3   3 0.9603175 0.9523810 
4   4 0.9840000 0.9920000 
5   5 0.9444444 0.9523810

问：有没有一种办法优化它？我最终会增加模型，我只想通过algorithms变量在一个函数中，并计算所有模型的准确性，而不需要manuall y为每个型号书写summarise(acc_ksvm = sum(ksvm == class)/n() , acc_rf = sum(rf == class)/n())。

这可以通过应用来完成吗？或者我是否必须改变我的df的构建方式，以便按模型分组？

谢谢！

来源

2016-10-01 Saul Garcia

什么你在寻找优化吗？速度？这似乎是迄今为止相当优雅的解决方案。如果你所要做的只是将模型添加到'算法'向量中，我认为你上面的'dplyr'代码做得很好，假设你的数据不是*巨大的*，并且你没有测试许多参数许多型号。 – blacksite

你是对的，也许我应该写*自动*而不是*优化*。 –

因为sum(ksvm == class)/n()是真正的算法列的TRUE匹配的组平均值来类，首先考虑创建的逻辑值列（TRUE/FALSE匹配），然后使用dplyr的summarise_each在所有其他列：

algorithms <- c("alg1", "alg2", "alg3", "alg4", "alg5") results[algorithms] <- sapply(algorithms, function(i){ results[i] == results$class }) summarydf <- results[c("iteration", algorithms)] %>% group_by(iteration) %>% summarise_each(funs(mean)) %>% setNames(c("iteration", paste0("acc_", algorithms)))

来源

2016-10-02 05:21:36 Parfait

这真的很有趣，我很感激！ –

如何优化R中多个预测模型的准确性代码？

回答

相关问题