2014-10-09 39 views
1

我有两个数据帧,我想对匹配列做t.test。这两个数据帧都是大数据框的子集,因此所有的名称都是相同的并匹配(ncol =〜20000),而nrow(df1)= 25和nrow(df2)= 23。mapply()对两个数据帧的列向量t.test不成功(R)

例子:

treatment<-matrix(rnorm(50), ncol=10) 
control<-matrix(rnorm(50), ncol=10) 

treatment 
      [,1]  [,2]  [,3]  [,4]  [,5]  [,6] 
[1,] 0.23442246 1.02256703 1.0499998 0.2913643 -1.2083822 0.3778403 
[2,] -0.68888047 -0.03961717 -0.9978793 -0.9792061 -0.1831634 0.6140542 
[3,] -1.88273887 -0.49701513 0.1845197 0.4385338 1.2249121 0.5444027 
[4,] 1.21359446 0.87333933 0.5615304 0.3803339 1.1294489 -0.8777454 
[5,] -0.02908159 -1.50296138 0.4624656 0.1335046 1.1665818 -0.4475185 
      [,7]  [,8]  [,9]  [,10] 
[1,] 0.5987723 0.5910937 0.4334874 -1.4198250 
[2,] 0.2027346 0.8078187 -1.0573069 1.0727554 
[3,] 0.5490159 0.5109912 1.7247428 1.7745333 
[4,] 0.3044544 0.6476548 1.1959365 -0.1220841 
[5,] 1.8681375 0.8451147 0.4283893 0.1044125 

control 
      [,1]  [,2]  [,3]  [,4]  [,5]  [,6] 
[1,] 0.6712834 -0.3775649 0.7741285 0.51224345 0.24128336 1.02580198 
[2,] 0.3894112 -0.1835289 0.4982122 1.73512459 0.08991013 -0.04406897 
[3,] 1.7068503 0.7909355 -0.3341426 0.08780239 -1.11563321 2.09984105 
[4,] -0.7634818 -1.3672888 0.2161816 -0.65170516 0.81247509 1.68008404 
[5,] 0.5787616 0.1704100 -0.3166737 0.90167409 -2.34854292 0.31571255 
      [,7]  [,8]  [,9]  [,10] 
[1,] -1.6111883 0.1019497 -0.1975491 -0.3776000 
[2,] 0.7533329 1.1540590 1.0050663 2.0137347 
[3,] 1.2224161 1.4411853 -0.4801494 -0.3891034 
[4,] 0.1905461 0.9767801 -0.1442578 -0.9946735 
[5,] -1.9581454 -0.2874181 -1.0421440 -0.6177782 

我做了一些SO搜索上和整个mapply()传来:

mapply(t.test,treatment,control) 
Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) : 
    not enough 'x' observations 

但是当我做t.test单柱:

t.test(treatment[,1],control[,1]) 

    Welch Two Sample t-test 
data: treatment[, 1] and control[, 1] 
t = -1.1541, df = 7.492, p-value = 0.284 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 
-2.2577187 0.7635152 
sample estimates: 
mean of x mean of y 
-0.2305368 0.5165649 

这里有什么问题?

回答

2

treatmentcontrol,如matrix目的,基本上是一个vector(如c(1,2,3)),因此mapply尝试运行t.test比较每个个别数值。例如:

treatment[1] 
#[1] 0.7545039 
control[1] 
#[1] -0.3926361 

t.test(treatment[1],control[1]) 
#Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) : 
# not enough 'x' observations 

如果您将您的矩阵来data.frame对象,每列将被视为一个单一的对象,mapply会工作得很好:

mapply(t.test,as.data.frame(treatment),as.data.frame(control)) 

#   V1          
#statistic -0.7829546        
#parameter 7.698139        
#p.value  0.4570611        
#etc etc 

在这种情况下,我几乎肯定使用Map更适合可读性的原因:

Map(t.test,as.data.frame(treatment),as.data.frame(control)) 

#$V1 
# 
#  Welch Two Sample t-test 
# 
#data: dots[[1L]][[1L]] and dots[[2L]][[1L]] 
#t = -0.783, df = 7.698, p-value = 0.4571 
#alternative hypothesis: true difference in means is not equal to 0 
#95 percent confidence interval: 
# -1.525349 0.756036 
#sample estimates: 
# mean of x mean of y 
#-0.31246928 0.07218723 
+0

谢谢,它的工作! – Menglan 2014-10-09 03:10:24