2017-04-04 64 views
0

MVE: 让这成为数据集:自动化回归与特定的因变量和自变量

data <- data.frame(year = rep(seq(1966,2015,1), 8), 
       county = c(rep('prva', 50), rep('druga', 50), rep('treća', 50), rep('četvrta', 50), 
          rep('peta', 50), rep('šesta', 50), rep('sedma', 50), rep('osma', 50)), 
       crime1 = runif(400), crime2 = runif(400), crime3 = runif(400), 
       uvar1 = runif(400), uvar2 = runif(400), uvar3 = runif(400), 
       var1 = runif(400), var2 = runif(400), var3 = runif(400), var4 = runif(400), var5 = runif(400)) 

假设crime1,2和3是具体的因变量。 uvar1,2和3是特定的自变量。 var1,2等是其他协变量。我想要做的是自动化回归。

也就是说,我要得到这个代码的结果:

plm(log(crime1) = log(univar1) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data) 

plm(log(crime2) = log(univar2) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data) 

等;但没有为每个估计模型编写20行代码。

通过寻找类似的问题,这是因为据我会来:

crime <- c('crime1', 'crime2', 'crime3') 
plm.results <- lapply(data[, crime], function(y) plm(y ~ var1 + var2 + var3 + var4, 
                model = 'within', effect ='twoways', data = data)) 

这肯定有助于我的因变量,但我想不出如何将在这些估计的特定的独立变量。为了澄清一次,我希望univar1在第一次回归中,但不在其余部分中。

回答

0

formula功能在创建多组模型时很有用。您可以纳入变化 使用paste0组合formulalapply遍历指数1至3

#remember to set.seed when sampling from distributions 

set.seed(123) 

#a helper function to create "log(var)" from "var" 
fn_appendLog = function(x) { 
paste0("log(",x,")") 
} 



modelList = lapply(1:3,function(x) { 


indepVars2 = Reduce(function(x,y) paste(x,y,sep="+"),lapply(colnames(regDF)[grepl("^v",colnames(regDF))],fn_appendLog)) 

#> indepVars2 
#[1] "log(var1)+log(var2)+log(var3)+log(var4)+log(var5)" 


indepVars1 = fn_appendLog(paste0("uvar",x)) 

depVar = fn_appendLog(paste0("crime",x)) 

formulaVar = formula(paste0(depVar, " ~ ",indepVars1,"+", indepVars2)) 

#> formulaVar 
#log(crime1) ~ log(uvar1) + log(var1) + log(var2) + log(var3) + log(var4) + log(var5) 


modelObj = plm(formulaVar, model = 'within', effect = 'twoways', data = regDF) 


}) 

摘要:

summary(modelList[[1]]) 

#> summary(modelList[[1]]) 
#Twoways effects Within Model 
# 
#Call: 
#plm(formula = formulaVar, data = regDF, effect = "twoways", model = "within") 
# 
#Balanced Panel: n=50, T=8, N=400 
# 
#Residuals : 
# Min. 1st Qu. Median 3rd Qu. Max. 
# -5.730 -0.396 0.116 0.599 1.520 
# 
#Coefficients : 
#    Estimate Std. Error t-value Pr(>|t|) 
#log(uvar1) 0.0393871 0.0490891 0.8024 0.4229 
#log(var1) -0.0369356 0.0541029 -0.6827 0.4953 
#log(var2) -0.0455269 0.0543664 -0.8374 0.4030 
#log(var3) 0.0150516 0.0520347 0.2893 0.7726 
#log(var4) -0.0034534 0.0441506 -0.0782 0.9377 
#log(var5) -0.0109038 0.0527446 -0.2067 0.8363 
# 
#Total Sum of Squares: 302.23 
#Residual Sum of Squares: 300.6 
#R-Squared:  0.0053896 
#Adj. R-Squared: 0.0045407 
#F-statistic: 0.304357 on 6 and 337 DF, p-value: 0.93448 

说明:

独立变量有两种类型,第一种是uvar1和其他var1...varN

1)colnames(regDF)[grepl("^v",colnames(regDF))]这会给我们所有变量的 列表中regDF开始以字母“V”与 字符串和$插入符号符号标志着开始为字符串的结束,输出其匹配模式在这个阶段是c("var1","var2"...,"var5")

2)我们需要登录这个变量矢量的变体,因此我们将它们传递通过lapply给函数 fn_appendLog,这导致的list("log(var1)","log(var2)",...,"log(var5)")

3)接着列表输出,就需要这些v ariables转化为log(var1)+log(var2)...+log(var5)

4)要做到这一点,我们使用功能Reduce与功能paste(x,y,sep="+"),这需要 与相邻的元件上面的列表中的每个元素,并与分隔符一起加入作为“+”

step1 = (log(var1)+log(var2)) 
    step2 = (log(var1)+log(var2)) + log(var3) 
    step3 = (log(var1)+log(var2)+log(var3))+ log(var4) and so on 

5)功能Reduce将该函数应用于列表并聚集输出到所得的log(var1)+log(var2)+log(var3)+log(var4)+log(var5)

最终输出的单个载体 这似乎在冷杉恐吓但你经常使用它们,并探索他们的例子 将你的部分曲目在任何时间。了解一个函数的最佳方式是说lapply是阅读文档的端到端?lapply和执行 列出的例子,修改参数并获得熟悉。希望这对你的查询减少了一些光线 。

+0

正是我在找的东西。非常感谢! – Astronaut

+0

尽管它的功能完美,但我很想知道你在这里做了什么,而且我在这个部分做了很多努力:indepVars2 = Reduce(函数(x,y)paste(x,y,sep =“+”), lapply (colnames(data)[grepl(“^ v”,colnames(data))],fn_appendLog))请您详细说明这部分究竟做了什么? – Astronaut

+0

我已经添加了一些涉及'Reduce'和'lapply'的步骤的解释,让我知道这是否足够。 – OdeToMyFiddle