用户自定义函数在两个数据集之间进行t检验

我是一个试图找出lapply的新用户。用户自定义函数在两个数据集之间进行t检验

我有两个数据集，每个都有相同的30个变量，我试图运行t检验来比较每个样本中的变量。我的理想结果是列出每个变量以及t stat和两个数据集之间的变量差异的p值。

我试图设计一个函数来做t检验，以便我可以使用lapply。这是我的代码和一个可重现的例子。

height<-c(2,3,4,2,3,4,5,6) 
weight<-c(3,4,5,7,8,9,5,6) 
location<-c(0,1,1,1,0,0,0,1) 
data_test<-cbind(height,weight,location) 
data_north<-subset(data_test,location==0) 
data_south<-subset(data_test,location==1) 
variables<-colnames(data_test) 
compare_t_tests<-function(x){ 
    model<-t.test(data_south[[x]], data_north[[x]], na.rm=TRUE) 
    return(summary(model[["t"]]), summary(model[["p-value"]])) 
} 
compare_t_tests(height)

其中获得误差：

Error in data_south[[x]] : attempt to select more than one element

我的计划是使用功能lapply这样，一旦我看着办吧。

lapply(variables, compare_t_tests)

我会很感激的任何建议。在我看来，我甚至可能不会看这个权利，所以重定向也是受欢迎的！

来源

2016-01-03 user5457414

你非常接近。只是有一些调整：

数据：

height <- c(2,3,4,2,3,4,5,6) 
weight <- c(3,4,5,7,8,9,5,6) 
location <- c(0,1,1,1,0,0,0,1)

使用data.frame而不是cbind获得使用实名的数据帧...

data_test <- data.frame(height,weight,location) 
data_north <- subset(data_test,location==0) 
data_south <- subset(data_test,location==1)

不包括在location一组变量...

variables <- colnames(data_test)[1:2] ## skip location

使用mod el，而不是总结;返回一个向量

compare_t_tests<-function(x){ 
    model <- t.test(data_south[[x]], data_north[[x]], na.rm=TRUE) 
    unlist(model[c("statistic","p.value")]) 
}

与引号中的变量比较，而不是原始符号：

compare_t_tests("height") 
## statistic.t  p.value 
## 0.2335497 0.8236578

使用sapply将自动折叠结果到表：

sapply(variables,compare_t_tests) 
##    height  weight 
## statistic.t 0.2335497 -0.4931970 
## p.value  0.8236578 0.6462352

你可以转置此（t()），如果您愿意...

来源

2016-01-03 19:25:13

Th非常好，完美的作品！ – user5457414

不客气。技术上[你不应该使用评论来说“谢谢你”]（http://meta.stackoverflow.com/questions/267490/thanks-in-a-comment）（投票和接受应该足够感谢），但是我 –

用户自定义函数在两个数据集之间进行t检验

回答

相关问题