2017-02-02 32 views
0
#Generate some data 
set.seed(1234) 
rows = 100 
created_data <- data.frame(index = 1:rows, 
          catsA = sample((letters[1:5]),rows,replace=T), 
          valueA = round(rnorm(rows),3)) 

使用dplyr创建一个计数类别和顺序。基于累积频率折叠dplyr tibble的行

library(dplyr) 

count_of_cat <- created_data %>% 
    group_by(catsA) %>% 
    summarise(rowcount = n()) %>% 
    ungroup %>% 
    arrange(-rowcount) %>% 
    mutate(rel.freq = round(rowcount/sum(rowcount),3)) %>% 
    mutate(cum.freq = cumsum(rel.freq)) 

输出

catsA rowcount rel.freq cum.freq 
1  b  26  0.26  0.26 
2  a  25  0.25  0.51 
3  c  17  0.17  0.68 
4  d  17  0.17  0.85 
5  e  15  0.15  1.00 

是否有汇总行后说的好办法cum.freq> 0.50

所需的输出

catsA rowcount rel.freq cum.freq 
1  b  26  0.26  0.26 
2  a  25  0.25  0.51 
3  new  49  0.49  1.00 

回答