2016-05-26 19 views
0

标题似乎有点混乱,所以让我看看,如果我可以用一个小例子阐明数据帧分割和顺序列这R:</p> <p>我有3列看上去就像一个数据帧:根据另一列

col1  col2  col3 
1 A,D,C sd,dg,ds 5,26,1 
2 D,F fh,we 85,41 
3  H  hr  27 
4 C,A,D ds,sd,dg 235,65,3 
5 Q,G,J rt,gh,we 34,98,65 

我想字母顺序COL1的每一个元素,然后订购COL2和COL3的每个元素按照COL1的顺序,得到这样的:

col1  col2  col3 
1 A,C,D sd,ds,dg 5,1,26 
2 D,F fh,we 85,41 
3  H  hr  27 
4 A,C,D sd,ds,dg 65,235,3 
5 G,J,Q gh,we,rt 98,65,34 

这是后来我重要原因想COL1聚集,我需要的元件1,4的例子等于(A,C,d)

到目前为止,我被困在这里:

MWE

my.df <- data.frame(col1=c('A,D,C','D,F','H','C,A,D','Q,G,J'), col2=c('sd,dg,ds','fh,we','hr','ds,sd,dg','rt,gh,we'), col3=c('5,26,1','85,41','27','235,65,3','34,98,65')) 
my.df 
my.df$col1 <- sapply(sapply(strsplit(as.character(my.df$col1), ','), sort), paste, collapse=',') 
my.df 

任何帮助赞赏!谢谢!!

回答

1

你可以把每一行成数据帧,重新排序基于data.frame在列1上,然后将它们全部粘贴在一起:

# split the entries by commas and 
# turn each row of my.df into a data frame 
# storing each data frame in a list element 
dfList <- lapply(
    apply(my.df, 1, strsplit, ","), 
    function(x) data.frame(x)) 

# sort each data frame by col1 
dfSortedList <- lapply(dfList, function(x) x[with(x, order(col1)), ]) 

# paste columns back together and arrange as desired 
t(sapply(dfSortedList, function(x) apply(x, 2, paste, collapse = ","))) 

#  col1 col2  col3  
#[1,] "A,C,D" "sd,ds,dg" "5,1,26" 
#[2,] "D,F" "fh,we" "85,41" 
#[3,] "H"  "hr"  "27"  
#[4,] "A,C,D" "sd,ds,dg" "65,235,3" 
#[5,] "G,J,Q" "gh,we,rt" "98,65,34" 

如果需要,可以转换回数据框。

+0

真的很优雅,最好避免循环! – DaniCee

1

在这里你去:

my.df <- data.frame(col1=c('A,D,C','D,F','H','C,A,D','Q,G,J'), col2=c('sd,dg,ds','fh,we','hr','ds,sd,dg','rt,gh,we'), col3=c('5,26,1','85,41','27','235,65,3','34,98,65'),stringsAsFactors = F) 

for (k in 1:dim(my.df)[1]){ 
    tempdf <- data.frame(strsplit(my.df[k,1],","),strsplit(my.df[k,2],","),strsplit(my.df[k,3],","),stringsAsFactors = F) 
    tempdf <- tempdf[order(tempdf[,1]),] 
    my.df[k,] <- sapply(tempdf,paste,collapse=",") 
} 

正如你所看到的,我去了由逗号分隔的字符串的每一行转换成一个临时的数据帧。那么您只需要按第一列对临时数据帧进行排序。并从那里你崩溃tempdf的每一列转换为字符串原始my.df更换

结果:

> my.df 
    col1  col2  col3 
1 A,C,D sd,ds,dg 5,1,26 
2 D,F fh,we 85,41 
3  H  hr  27 
4 A,C,D sd,ds,dg 65,235,3 
5 G,J,Q gh,we,rt 98,65,34 
1

我们可以使用cSplitsplitstackshapedata.table这样做。

library(splitstackshape) 
na.omit(cSplit(setDT(my.df, keep.rownames=TRUE), 2:4, ",","long"))[ 
     , {i1 <- order(col1) 
     lapply(.SD, function(x) paste(x[i1], collapse=",")) 
    }, rn][, rn:= NULL][] 
# col1  col2  col3 
#1: A,C,D sd,ds,dg 5,1,26 
#2: D,F fh,we 85,41 
#3:  H  hr  27 
#4: A,C,D sd,ds,dg 65,235,3 
#5: G,J,Q gh,we,rt 98,65,34 

或者稍微更长的选项会分裂“COL1”和数据集转换为与cSplit“长”格式,然后通过“COL2”和“COL3”分组,我们创建了一个order列('i1')和sort ed'col1'。然后,指定.SDcols为“COL2”和“COL3”,遍历那些lapply,使用,拆分中的列,输出一起改变基于“I1”列与Maporderpaste它和分配(:=)回原来的专栏。如果需要,将'i1'分配给NULL。

d1 <- cSplit(my.df, "col1", ",", "long")[, 
.(i1 = list(order(col1)), col1 = toString(sort(col1))) ,.(col2, col3)] 
d1[, c('col2', 'col3') := lapply(.SD, function(x) 
    Map(function(x, y) x[y], strsplit(as.character(x), ","), d1$i1)), .SDcols = col2:col3] 
d1[, i1:= NULL] 
d1[, names(my.df), with = FALSE] 
#  col1  col2  col3 
#1: A, C, D sd,ds,dg 5,1,26 
#2: D, F fh,we 85,41 
#3:  H  hr  27 
#4: A, C, D sd,ds,dg 65,235,3 
#5: G, J, Q gh,we,rt 98,65,34 
相关问题