2012-12-31 116 views
10

我想从dat2添加变量:合并dataframes,不同长度

  concreteness familiarity typicality 
amoeba   3.60  1.30  1.71 
bacterium   3.82  3.48  2.13 
leech    5.71  1.83  4.50 

dat1

ID variable value 
1 1 amoeba  0 
2 2 amoeba  0 
3 3 amoeba NA 
251 1 bacterium  0 
252 2 bacterium  0 
253 3 bacterium  0 
501 1  leech  1 
502 2  leech  1 
503 3  leech  0 

给予以下的输出:

X ID variable value concreteness familiarity typicality 
1 1 1 amoeba  0   3.60  1.30  1.71 
2 2 2 amoeba  0   3.60  1.30  1.71 
3 3 3 amoeba NA   3.60  1.30  1.71 
4 251 1 bacterium  0   3.82  3.48  2.13 
5 252 2 bacterium  0   3.82  3.48  2.13 
6 253 3 bacterium  0   3.82  3.48  2.13 
7 501 1  leech  1   5.71  1.83  4.50 
8 502 2  leech  1   5.71  1.83  4.50 
9 503 3  leech  0   5.71  1.83  4.50 

正如你所看到的来自dat1的信息必须复制到的多行中。

这是我失败的尝试:

dat3 <- merge(dat1, dat2, by=intersect(dat1$variable(dat1), dat2$row.names(dat2))) 

Givng以下错误:

Error in as.vector(y) : attempt to apply non-function 

请在这里找到复制的例子:

DAT1:

structure(list(ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), variable = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("amoeba", "bacterium", 
"leech", "centipede", "lizard", "tapeworm", "head lice", "maggot", 
"ant", "moth", "mosquito", "earthworm", "caterpillar", "scorpion", 
"snail", "spider", "grasshopper", "dust mite", "tarantula", "termite", 
"bat", "wasp", "silkworm"), class = "factor"), value = c(0L, 
0L, NA, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = c("ID", "variable", 
"value"), row.names = c(1L, 2L, 3L, 251L, 252L, 253L, 501L, 502L, 
503L), class = "data.frame") 

DAT2:

structure(list(concreteness = c(3.6, 3.82, 5.71), familiarity = c(1.3, 
3.48, 1.83), typicality = c(1.71, 2.13, 4.5)), .Names = c("concreteness", 
"familiarity", "typicality"), row.names = c("amoeba", "bacterium", 
"leech"), class = "data.frame") 

回答

12

你可以加入一个变量添加到DAT2然后使用合并:

dat2$variable <- rownames(dat2) 
merge(dat1, dat2) 
    variable ID value concreteness familiarity typicality 
1 amoeba 1  0   3.60  1.30  1.71 
2 amoeba 2  0   3.60  1.30  1.71 
3 amoeba 3 NA   3.60  1.30  1.71 
4 bacterium 1  0   3.82  3.48  2.13 
5 bacterium 2  0   3.82  3.48  2.13 
6 bacterium 3  0   3.82  3.48  2.13 
7  leech 1  1   5.71  1.83  4.50 
8  leech 2  1   5.71  1.83  4.50 
9  leech 3  0   5.71  1.83  4.50 
+1

此答案可与所示的采样数据但如果有'dat1',则会丢弃所有不匹配的行。 –

+1

@ G.Grothendieck好赶上!需要添加all.x = T。 – agstudy

7

没有错@ agstudy的答案,但是你可以不用通过创建一个匿名临时实际修改DAT2。添加X是相似的:

> merge(cbind(dat1, X=rownames(dat1)), cbind(dat2, variable=rownames(dat2))) 
    variable ID value X concreteness familiarity typicality 
1 amoeba 1  0 1   3.60  1.30  1.71 
2 amoeba 2  0 2   3.60  1.30  1.71 
3 amoeba 3 NA 3   3.60  1.30  1.71 
4 bacterium 1  0 251   3.82  3.48  2.13 
5 bacterium 2  0 252   3.82  3.48  2.13 
6 bacterium 3  0 253   3.82  3.48  2.13 
7  leech 1  1 501   5.71  1.83  4.50 
8  leech 2  1 502   5.71  1.83  4.50 
9  leech 3  0 503   5.71  1.83  4.50 
8

试试这个:

merge(dat1, dat2, by.x = 2, by.y = 0, all.x = TRUE) 

这是假设,如果有在dat1是无与伦比的,然后在结果dat2列应该充满NA,如果任何行存在在dat2中是无与伦比的值,那么它们将被忽略。例如:

dat2a <- dat2 
rownames(2a)[3] <- "elephant" 
# the above still works: 
merge(dat1, dat2a, by.x = 2, by.y = 0, all.x = TRUE) 

上述已知为留在SQL加入并且可以像这样在sqldf进行(忽略警告):

library(sqldf) 
sqldf("select * 
     from dat1 left join dat2 
     on dat1.variable = dat2.row_names", 
     row.names = TRUE)