2015-10-14 65 views
0

我试图找出相关的解释变量并消除。我使用Sapply将回归应用于我感兴趣的变量,并手动删除FIV> 10的变量。但是,当我尝试重现此操作以快速运行多个vif时,我无法设法获取我的回归脚本使用包含我想保留的名称的粘贴的公式对象运行。下图:R:在Sapply中使用粘贴公式

regressiondata <- data.frame(matrix(ncol=9,nrow=100,runif(900,1,100))) 
colnames(regressiondata) <- c("indep1","indep2","indep3","indep4","var1","var2","var3","var4","var5") 
vifs1_model <- sapply(regressiondata[,indep_variables],function(x) vif(lm(x~var1+var2+var3+var4+var5, 
                     data = regressiondata, 
                     na.action=na.exclude))) 
vifs1 <- rowMeans(vifs1_model) 
formula_variables <- paste(names(vifs1),collapse="+") 
final_model <- t(round(sapply(regressiondata[,indep_variables], 
      function(x) lm(x ~ formula_variables,data=regressiondata,na.action=na.exclude)$coef),2)) 
我跑的时候

“final_model” 我得到这个错误:

错误吨(圆(sapply(regressiondata [,indep_variables],函数(X)LM(X〜: 错误在选择函数't'的方法时评估参数'x':model.frame.default中的错误(公式= x〜formula_variables,data = regressiondata,: 可变长度不同(找到'formula_variables')

回答

1

我认为你有几个问题:

  1. 时,它看起来像你只是想sapply以上的自变量名的载体
  2. 以流明你最后的嵌套调用似乎

这里是混合表达式和字符串您正在使用sapply在数据帧我走过去。您的代码是指让我在一些线路已经增加了一些缺失的对象我想你离开了

library(car) # for fiv() 
regressiondata <- data.frame(matrix(ncol=9,nrow=100,runif(900,1,100))) 
colnames(regressiondata) <- c("indep1", 
           "indep2", 
           "indep3", 
           "indep4", 
           "var1", 
           "var2", 
           "var3", 
           "var4", 
           "var5") 

indep_variables <- names(regressiondata)[1:4] # object did not exist 

我爆发匿名函数为清楚:

f1 <- function(x) { 
    vif(lm(x~var1+var2+var3+var4+var5, 
     data = regressiondata, 
     na.action=na.exclude)) 
} 

现在你的回归

vifs1_model <- sapply(regressiondata[,indep_variables], f1) 
vifs1 <- rowMeans(vifs1_model) 
formula_variables <- paste(names(vifs1),collapse="+") 

我把这个函数命名为拉系数,并用整个公式递给一个字符向量(字符串):

getCoefs <- function(x) { 
    lm(paste(x, "~", formula_variables), data=regressiondata, 
    na.action=na.exclude)$coef 
} 

现在,只需在sapply名的载体,然后转和轮:

final_model <- sapply(indep_variables, getCoefs) 
final_model <- t(round(final_model ,2)) 
0

这里是一个做事的方式dplyr。大部分工作由sub_regression函数完成,sub_regression函数执行回归,通过vif过滤独立变量,然后重做回归

library(dplyr) 
library(tidyr) 
library(magrittr) 
library(car) 

sub_regression = function(sub_data_frame) 
    lm(independent_value ~ var1+var2+var3+var4+var5, 
    data = sub_data_frame , 
    na.action="na.exclude") %>% 
    vif %>% 
    Filter(function(x) x <= 10, .) %>% 
    names %>% 
    paste(collapse = " + ") %>% 
    paste("independent_value ~ ", .) %>% 
    as.formula %>% 
    lm(. , sub_data_frame, na.action="na.exclude") %>% 
    coefficients %>% 
    round(3) %>% 
    as.list %>% 
    data.frame(check.names = FALSE) 

matrix(ncol=9,nrow=100,runif(900,1,100)) %>% 
    data.frame %>% 
    setNames(c("indep1","indep2","indep3","indep4","var1","var2","var3","var4","var5")) %>% 
    gather(independent_variable, independent_value, 
     indep1, indep2, indep3, indep4) %>% 
    group_by(independent_variable) %>% 
    do(sub_regression(.))