2017-07-22 49 views
0

我试图找到具有最低AIC的模型。模型从两个for循环返回,可以组合列。我无法使用最低的AIC来创建函数返回模型。下面的代码演示,我卡住了:R中最低AIC的寻找模型(从for循环返回)

rm(list = ls()) 

data <- iris 

data <- data[data$Species %in% c("setosa", "virginica"),] 

data$Species = ifelse(data$Species == 'virginica', 0, 1) 

mod_headers <- names(data[1:ncol(data)-1]) 

f <- function(mod_headers){ 
    for(i in 1:length(mod_headers)){ 
    tab <- combn(mod_headers,i) 
    for(j in 1:ncol(tab)){ 
     tab_new <- c(tab[,j]) 
     mod_tab_new <- c(tab_new, "Species") 
     model <- glm(Species ~., data=data[c(mod_tab_new)], family = binomial(link = "logit")) 
    } 
    } 
    best_model <- model[which(AIC(model)[order(AIC(model))][1])] 
    print(best_model) 
} 

f(mod_headers) 

有什么建议?谢谢!

+0

你应该阅读http://r4ds.had.co.nz/many-models.html –

+0

https://www.rdocumentation.org/packages/MuMIn/versions/1.15.6/topics/dredge – Roland

回答

0

glm()使用迭代重新加权最小二乘算法。该算法达到迭代的最大数量,汇聚之前 - 改变这个参数在你的情况可以帮助:

glm(Species ~., data=data[mod_tab_new], family = binomial(link = "logit"), control = list(maxit = 50)) 

有使用which另一个问题,我有一个if取代它的​​每个模型拟合后比较最低AIC至今。不过,我认为有比这更好的解决方案。

f <- function(mod_headers){ 
    lowest_aic <- Inf  # added 
    best_model <- NULL # added 

    for(i in 1:length(mod_headers)){ 
    tab <- combn(mod_headers,i) 
    for(j in 1:ncol(tab)){ 
     tab_new <- tab[, j] 
     mod_tab_new <- c(tab_new, "Species") 
     model <- glm(Species ~., data=data[mod_tab_new], family = binomial(link = "logit"), control = list(maxit = 50)) 
     if(AIC(model) < lowest_aic){ # added 
     lowest_aic <- AIC(model) # added 
     best_model <- model  # added 
     } 
    } 
    } 
    return(best_model) 
} 
+0

这实际上是工作,似乎是最有效的找到我的最佳模型与一堆cols。谢谢! – New2coding

0

我换成你的for循环使用向量化的替代品

library(tidyverse) 
library(iterators) 
# Column names you want to use in glm model, saved as list 
whichcols <- Reduce("c", map(1:length(mod_headers), ~lapply(iter(combn(mod_headers,.x), by="col"),function(y) c(y)))) 

# glm model results using selected column names, saved as list 
models <- map(1:length(whichcols), ~glm(Species ~., data=data[c(whichcols[[.x]], "Species")], family = binomial(link = "logit"))) 

# selects model with lowest AIC 
best <- models[[which.min(sapply(1:length(models),function(x)AIC(models[[x]])))]] 

输出

Call: glm(formula = Species ~ ., family = binomial(link = "logit"), 
data = data[c(whichcols[[.x]], "Species")]) 

Coefficients: 
(Intercept) Petal.Length 
     55.40  -17.17 

Degrees of Freedom: 99 Total (i.e. Null); 98 Residual 
Null Deviance:  138.6 
Residual Deviance: 1.208e-09 AIC: 4 
0

使用你的循环,只是把所有的车型在一个列表。 然后计算所有这些模型的AIC。 最后用最小AIC返回模型。

f <- function(mod_headers) { 

    models <- list() 
    k <- 1 
    for (i in 1:length(mod_headers)) { 
    tab <- combn(mod_headers, i) 
    for(j in 1:ncol(tab)) { 
     mod_tab_new <- c(tab[, j], "Species") 
     models[[k]] <- glm(Species ~ ., data = data[mod_tab_new], 
         family = binomial(link = "logit")) 
     k <- k + 1 
    } 
    } 

    models[[which.min(sapply(models, AIC))]] 
}