2015-12-08 70 views
1

合并两个数据集后,我得到一个包含300个变量的数据(其中一些变量以.x结尾,一些以.y结尾,一些不以.x和.y结尾)。如何将所有不以.x和.y结尾的变量带到数据集的前100列。另外,我想让col 101像(day.x,day.y,city.x,city.y,number.x,number.y等等)一样排列。也就是说,具有相同名称的变量,比如城市,但具有不同的扩展名,彼此相邻/相邻。 例如:对变量重新排序

city.y<- c(1,2,3,5,5,7,7,NA,NA,3,4,5) 
B<-c(3,4,5,6,1,2,7,6,7,NA,NA,6) 
number.x<-c(1,2,3,4,5,6,7,NA,NA,5,5,6) 
day.x<-c(1,3,4,5,6,7,8,1,NA,3,5,3) 
Z<-c(1,2,3,4,5,6,7,NA,NA,5,5,6) 
day.y<-c(4,5,6,7,8,7,8,1,2,3,5,NA) 
number.y<-c(3,4,5,6,1,2,7,6,7,NA,NA,6) 
school.x<-c("a","b","b","c","n","f","h","NA","F","G","z","h") 
S<-c(5,2,3,4,5,6,5,NA,NA,5,6,6) 
school.y<-c("a","b","b","c","m","g","h","NA","NA","G","H","T") 
city.x<- c(1,2,3,7,5,8,7,5,6,7,5,1) 
df<- data.frame(city.y,B,number.x,day.x,Z,day.y,number.y,school.x,S,school.y,city.x) 

我要重新排序以这种格式变量:B,S,Z,city.x,city.y,number.x,number.y,day.x,day.y和...

回答

3

添加一列,以创造更多的一般使用情况:

df$ZZZZZ = 1:6 

然后,装入dplyr包(用于链接运营商%>%select功能):

library(dplyr) 

排序将得到列的每个子分组在正确的相对顺序:

names(df) = sort(names(df)) 

现在用正则表达式-matches("\\.[xy]$")捕捉到所有的列没有“.X”或“.Y”末并把这些列放在开头。然后把所有其他列放在他们后面。

df = df %>% select(-matches("\\.[xy]$"), everything()) 

df 

    A B C ZZZZZ city.x city.y day.x day.y number.x number.y school.x school.y 
1 1 3 1  1  1  1  4  3  a  5  a  1 
2 2 4 2  2  3  2  5  4  b  2  b  2 
... 
11 4 NA 5  5  5  5  5 NA  z  6  H  5 
12 5 6 6  6  3  6 NA  6  h  6  T  1 

如果你喜欢,你还可以设置在merge功能(而不是默认的“.X”和“.Y”)这样你自己的后缀:

merge(df1, df2, by="col", suffixes=c("_df1", "_df2")) 

如果你这样做那你当然也需要调整对列重新排序的正则表达式。

2

这应该这样做

extCols <- grepl("\\.", colnames(df)) 
df[, c(colnames(df)[(!extCols)], 
    sort(colnames(df)[extCols]))]