2012-02-24 42 views
4

我有如下所示的R数据帧:R数据变换

z = as.data.frame(list(Col1=c("a","c","e","g"),Col2=c("b","d","f","h"),Col3=c("1,2,5","3,5,7","9,8","1"))) 
> z 
    Col1 Col2 Col3 
1 a b 1,2,5 
2 c d 3,5,7 
3 e f 9,8 
4 g h  1 

(第三列是用逗号分隔值文本列)。我想将其转换为像一个数据帧这个:

a b 1 
a b 2 
a b 5 
c d 3 
c d 5 
c d 7 
e f 9 
e f 8 
g h 1 

任何人都可以提出一种方法来实现这个使用适用?我很接近使用下面的命令,但它不完全正确。在更有效的方式来做到这一点任何建议,将不胜感激,以及...

> apply(z,1,function(a){ids=strsplit(as.character(a[3]),",")[[1]];out<-c();for(id in ids){out<-rbind(out,c(a[1:2],id))};return(out)}) 
[[1]] 
    Col1 Col2  
[1,] "a" "b" "1" 
[2,] "a" "b" "2" 
[3,] "a" "b" "5" 

[[2]] 
    Col1 Col2  
[1,] "c" "d" "3" 
[2,] "c" "d" "5" 
[3,] "c" "d" "7" 

[[3]] 
    Col1 Col2  
[1,] "e" "f" "9" 
[2,] "e" "f" "8" 

[[4]] 
    Col1 Col2  
[1,] "g" "h" "1" 
+0

我还会注意到我这里测试提出了关于使用较大的实际数据集的两种解决方案,也许不奇怪的是,执行时间几乎相同。万一这对任何人都有用...... – Andrew 2012-02-24 19:28:37

回答

3

随着reshapereshape2

require(reshape2) 
merge(cbind(z[,-3], L1=rownames(z)), melt(strsplit(as.character(z$Col3),","))) 

L1 Col1 Col2 value 
1 1 a b  1 
2 1 a b  2 
3 1 a b  5 
4 2 c d  3 
5 2 c d  5 
6 2 c d  7 
7 3 e f  9 
8 3 e f  8 
9 4 g h  1 
+0

不错!两个答案都很完美,所以选择一个“更好”的答案是不可能的。结束了选择这个答案,因为它指向我'合并'函数,这似乎是一个很好的通用工具,我也需要学习... – Andrew 2012-02-24 18:33:49

5

您可以使用ddply

library(plyr) 
ddply(z, c("Col1", "Col2"), summarize, 
    Col3=strsplit(as.character(Col3),",")[[1]] 
)