2017-08-27 30 views
0

我想连线合并。我有一个数据集df及相应ID列表ID分配多个ID到名称

ID <- data.frame(Alphabet = c("A", "A","A","B", "B", "C"), 
      Value = c(101,102, 103,201,202,301)) 

df <- data.frame(Name = c("A", "A","B", "C")) 

我想合并/分配ID给df 并获得了DF看起来像

Name ID1 ID2 ID3 
A  101 102 103 
A  101 102 103 
B  201 202 
C  301 

回答

1

我会通过准备清单解决这个问题包括最终数据框的行,然后将它们“绑定”在一起。唯一的技巧是计算行的最大长度并相应地添加NAs。这应该工作。

ID <- data.frame(Alphabet = c("A", "A","A","B", "B", "C"), 
       Value = c(101,102, 103,201,202,301)) 

df <- data.frame(Name = c("A", "A","B", "C")) 


tmp <- lapply(df$Name, (function(id){ 
    ID[ID$Alphabet == id, ]$Value 
})) 
max.el <- max(sapply(tmp, length)) 
out.df <- do.call(rbind, lapply(tmp, (function(el){ 
    len.na <- max.el - length(el) 
    c(el, rep(NA, len.na)) 
}))) 

print(out.df, na.print = "") 

这是结果

 [,1] [,2] [,3] 
[1,] 101 102 103 
[2,] 101 102 103 
[3,] 201 202  
[4,] 301  

如果显示设备上没有问题,那么

colnames(out.df) <- paste("ID", c(1:max.el), sep = "") 
out.df <- cbind(df, out.df) 
out.df 

    Name ID1 ID2 ID3 
1 A 101 102 103 
2 A 101 102 103 
3 B 201 202 NA 
4 C 301 NA NA 
2

试试这个?使用NA比空好注意缺失值〜

如果确实想'',而不是NA仅仅使用outdf[is.na(outdf)]=''

library(dplyr) 
ID=ID%>%group_by(Alphabet)%>%mutate(ID=row_number()) 
library(reshape2) 
DF=as.data.frame(acast(ID, Alphabet~ID, value.var="Value")) 
DF$Name=row.names(DF) 
merge(df,DF,by='Name') 


    Name 1 2 3 
1 A 101 102 103 
2 A 101 102 103 
3 B 201 202 NA 
4 C 301 NA NA 

或使用tidyr(推荐〜因为你与data.frame工作)

library(dplyr) 
library(tidyr) 
ID=ID%>%group_by(Alphabet)%>%mutate(id=row_number()) 
DF=spread(ID, id,Value) 
merge(df,DF,by.x='Name',by.y='Alphabet') 

    Name 1 2 3 
1 A 101 102 103 
2 A 101 102 103 
3 B 201 202 NA 
4 C 301 NA NA 
0

为了完整起见,这里还有一个解决方案,使用data.table包中的dcast()来重塑长,以宽格式和右连接

library(data.table) 
# coerce to data.table 
setDT(D)[ 
    # reshape from long to wide, thereby creating column names 
    , dcast(.SD, Alphabet ~ rowid(Alphabet, prefix = "ID"))][ 
    # rename column 
    , setnames(.SD, "Alphabet", "Name")][ 
     # right join with df to repeat rows 
     setDT(df), on = "Name"] 
Name ID1 ID2 ID3 
1: A 101 102 103 
2: A 101 102 103 
3: B 201 202 NA 
4: C 301 NA NA 

万一NA不能所示,输出需要被转换为类型字符:

setDT(D)[, dcast(.SD, Alphabet ~ rowid(Alphabet, prefix = "ID"), as.character, fill = "")][ 
    , setnames(.SD, "Alphabet", "Name")][ 
     setDT(df), on = "Name"] 
Name ID1 ID2 ID3 
1: A 101 102 103 
2: A 101 102 103 
3: B 201 202  
4: C 301