2012-02-03 25 views
0

我有大集将一些变数:适用于多列了一个说法

set.seed (14) 
pool = sample (c("AA","AB", "BB"), 100, replace = T) 
mydf <- data.frame (M1= pool[1:10], M2= pool[11:20], 
M3= pool[21:30], M4= pool[31:40], M5= pool[41:50], 
    M6= pool[51:60], M7= pool[61:70], M8 = pool[71:80], 
    M9 = pool[81:90], M10 = pool[91:100]) 

需要安装包“hapassoc”,如果以前安装的。

install.packages( “hapassoc”)

> library(hapassoc) 
> example1.haplos <- pre.hapassoc(mydf, numSNPs = 3, allelic= F) 

Haplotypes will be based on the following SNPs (genotypic format): 
M8, M9, M10 
Remaining variables are: 
M1, M2, M3, M4, M5, M6, M7 

它正在最后3个变量在基团。但1要通过组打破数据成小块,应用此功能 -

M1, M2, M3 group 1 
M4, M5  group 2 
M6, M7, M8 group 3 
M9, M10  group 4 

因此numSNPs将通过以下向量表示:

nsp <- c(3, 2, 3, 2) 

我要保留$ haploMat每个组

example1.haplos$haploMat 
haplo1 haplo2 
1 hBBA hBAB 
3 hAAB hABB 
4 hABA hABA 
6 hAAA hBBA 
7 hAAA hAAA 
8 hBBA hBBB 
9 hABB hBBB 
10 hABA hBAB 
12 hAAA hBBB 
13 hAAB hBBA 
14 hABA hABA 
15 hAAB hBAB 

最终输出有八列group1.haplo1,goup1.haplo2,group2.haplo1,group2.haplo2,group3.haplo1,group4.haplo1,group4.haplo2。

我该如何做到这一点?

回答

1

这是你所追求的? (指定组的列号作为分配给grps的列表的元素)。您需要安装reshape2软件包。您可以使用plyr软件包中的rbind.fill()做类似的操作。

set.seed (14) 
pool = sample (c("AA","AB", "BB"), 100, replace = T) 
mydf <- data.frame (M1= pool[1:10], M2= pool[11:20], 
M3= pool[21:30], M4= pool[31:40], M5= pool[41:50], 
    M6= pool[51:60], M7= pool[61:70], M8 = pool[71:80], 
    M9 = pool[81:90], M10 = pool[91:100]) 

library(hapassoc) 

grps <- list(1:3, 4:5, 6:8, 9:10) 
haplos <- lapply(grps, function(x) { 
    out <- pre.hapassoc(mydf[, x], numSNPs=length(x), allelic=F, 
     verbose=F)$haploMat 
    row.names(out) <- as.numeric(row.names(out)) 
    out 
}) 
haplos <- lapply(haplos, t) 
library(reshape2) 
haplos <- melt(haplos,value.name='haplotype') 
haplos <- dcast(haplos, Var2 ~ L1 + Var1, value.var='haplotype') 

结果

haplos 

    Var2 1_haplo1 1_haplo2 2_haplo1 2_haplo2 3_haplo1 3_haplo2 4_haplo1 4_haplo2 
1  1  hABA  hABB  hBA  hBA  hAAA  hAAB  hAA  hAA 
2  2  <NA>  <NA>  hAB  hAB  hAAB  hABB  hAA  hAA 
3  3  hBAA  hAAB  hBA  hBB  hBBB  hBAA  hAA  hBA 
4  4  hBBB  hBAA  hBA  hAB  <NA>  <NA>  hAB  hBB 
5  5  <NA>  <NA>  hBB  hAA  hABB  hAAA  hAB  hBB 
6  6  hABB  hBBB  hBA  hBB  hABA  hAAB  hBB  hBB 
7  7  hBBB  hBBB  hAA  hAA  hBBB  hBAA  hAB  hBB 
8  8  hBBB  hABA  hBA  hAB  <NA>  <NA>  hAA  hAA 
9  9  <NA>  <NA>  hBB  hAA  hAAB  hAAB  hAA  hAB 
10 10  hBBB  hBAA  hAA  hBA  hABB  hBBB  hAB  hAB 
11 11  <NA>  <NA>  hBB  hBB  hBBA  hBBB  <NA>  <NA> 
12 12  hBBB  hABA  hAB  hBB  hABA  hABB  <NA>  <NA> 
13 13  <NA>  <NA>  <NA>  <NA>  hABB  hBAA  <NA>  <NA> 
14 14  hABB  hBBB  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
15 15  <NA>  <NA>  <NA>  <NA>  hAAB  hBBA  <NA>  <NA> 
16 16  hBAA  hABA  <NA>  <NA>  hAAA  hBBB  <NA>  <NA> 
+0

非常感谢您的回答,我愿意接受这个答案可是我没有得到什么,我需要在最后一行:haplos < - dcast(haplos,VAR2 〜L1 + Var1,value.var ='haplotype'),我也尝试过value_var =“haplotype” - 但确实发生了错误 – jon 2012-02-07 15:32:00

+0

@John我编辑过包含完整的代码,适合我。这是与hapassoc_1.2.4和reshape2_1.2.1。如果您仍然收到错误,您可以将它添加为评论吗? – jbaums 2012-02-07 22:15:52

+0

谢谢,我在旧的R版本中使用了reshape2_1.2.1和hapassoc_1.2-4,但使用较新版本的R是作品,谢谢...原因未知 – jon 2012-02-09 01:31:02

相关问题