2017-09-22 60 views
3

我拥有的数据框包含两个表格:ID和类型(字符)。见下:在R中创建一个包含行聚合的新数据

set.seed(123) 
ID <- seq(1,25) 
type <- sample(letters[1:26], 25, replace=TRUE) 

df <- data.frame(ID, type) 

我需要创建一个新的数据框,只包含一列。第一个观察将是第一个 列中的三个字母,第二个观察是第二个三个字母,并很快就会开始。

新的数据看起来像

ndf <- data.frame(ntype=c("huk", "wyb", "nxo", "lyl", "roc", "xgb", "iyx", "sqz", "r")) 

回答

3

我们创建一个分组变量与gl,然后用tapplypaste元素一起

n <- 3 
ndf <- data.frame(ntype = with(df, unname(tapply(type, as.integer(gl(nrow(df), n, 
     nrow(df))), FUN =paste, collapse=""))), stringsAsFactors= FALSE) 
ndf$ntype 
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r" 

或者另一种选择是paste全列在一起,然后拆分

strsplit(paste(df$type, collapse=""), "(?<=.{3})", perl = TRUE)[[1]] 
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r" 

或者另一种选择是substringpaste

substring(paste(df$type, collapse=""), seq(1, nrow(df), by = 3), 
     c(seq(3, nrow(df), by = 3), nrow(df))) 
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r" 

注:上述所有base R解决方案

+1

谢谢。有用! – user9292

+2

'(?<=。{3})'+1! – PoGibas

4

1)rollapply沿着输入向量:

library(zoo) 

rollapply(df$type, 3, by = 3, paste, collapse = "", partial = TRUE, align = "left") 

捐赠:

[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r" 

2)这种替代使用aggregate没有包。

n <- nrow(df) 
aggregate(type ~ gl(n, 3, n), df, paste, collapse = "")[2] 

,并提供:

type 
1 huk 
2 wyb 
3 nxo 
4 lyl 
5 roc 
6 xgb 
7 iyx 
8 sqz 
9 r 
0

通过使用dplyr

df$group=(df$ID-1)%/%3 
df%>%group_by(group)%>%dplyr::summarise(ntype=paste0(type,collapse = '')) 
# A tibble: 9 x 2 
    group ntype 
    <dbl> <chr> 
1  0 huk 
2  1 wyb 
3  2 nxo 
4  3 lyl 
5  4 roc 
6  5 xgb 
7  6 iyx 
8  7 sqz 
9  8  r 
相关问题