2017-08-08 26 views
2

我有一个看起来像这样的数据:具有广泛的数据的数据帧上卡方检验

ID gamesAlone gamesWithOthers gamesRemotely tvAlone tvWithOthers tvRemotely 
1 1             1 
2        1      1 
3        1    1 
4        1    1 
5        1      1 
6        1    1 
7        1    1 
8    1          1 
9 1                 1 

我想代码,可以做以下两件事情:

首先,变换这像这样整齐的列联表:

 Alone WithOthers Remotely 
games 2  1   6 
tv  4  4   1 

其次,使用卡方,看看这些活动(游戏v电视)在他们的社会背景不同。

这是代码来生成数据帧:

data<-data.frame(ID=c(1,2,3,4,5,6,7,8,9), 
      gamesAlone=c(1,NA,NA,NA,NA,NA,NA,NA,1), 
      gamesWithOthers=c(NA,NA,NA,NA,NA,NA,NA,1,NA), 
      gamesRemotely=c(NA,1,1,1,1,1,1,NA,NA), 
      tvAlone=c(NA,NA,1,1,NA,1,1,NA,NA), 
      tvWithOthers=c(1,1,NA,NA,1,NA,NA,1,NA), 
      tvRemotely=c(NA,NA,NA,NA,NA,NA,NA,NA,1)) 

回答

2

略去第一列ID[-1]),然后取每个列的总和(colSums),而除去NA值(na.rm=TRUE),并将得到的长度为6的矢量放入具有2行的矩阵中。如果需要,还可以相应地标注矩阵尺寸(参数为dimnames):

m <- matrix(
    colSums(data[-1], na.rm=T), 
    nrow=2, byrow=T, 
    dimnames = list(c("games", "tv"), c("alone", "withOthers", "remotely")) 
) 
m 
#  alone withOthers remotely 
# games  2   1  6 
# tv  4   4  1 
chisq.test(m) 
# 
# Pearson's Chi-squared test 
# 
# data: m 
# X-squared = 6.0381, df = 2, p-value = 0.04885 
0

这将让你在应急表中,你给的形式。建议:请拨打data1而不是data以避免混淆。

library(dplyr) 
library(tidyr) 
data1_table <- data1 %>% 
    gather(key, value, -ID) %>% 
    mutate(activity = ifelse(grepl("^tv", key), substring(key, 1, 2), substring(key, 1, 5)), 
     context = ifelse(grepl("^tv", key), substring(key, 3), substring(key, 6))) %>% 
    group_by(activity, context) %>% 
    summarise(n = sum(value, na.rm = TRUE)) %>% 
    ungroup() %>% 
    spread(context, n) 

# A tibble: 2 x 4 
    activity Alone Remotely WithOthers 
* <chr> <dbl> <dbl>  <dbl> 
1 games  2  6   1 
2  tv  4  1   4 

对于卡方:它取决于您想要比较的内容,我假设您的实际数据具有更高的计数。你可以管一大堆进入chisq.test这样的,但我不认为这是非常丰富:

data1_table %>% 
    select(2:4) %>% 
    chisq.test()