聚合数据帧，由发生

计数剩余的列我有一个数据帧：聚合数据帧，由发生

station person_id date 
1 0037  103103 2015-02-02 
2 0037  306558 2015-02-02 
3 0037  306558 2015-02-04 
4 0037  306558 2015-02-05

我需要聚合由站和日期框架，让每一个唯一的站/日期（每行）在结果中显示有多少人落在该行上。

例如，前2行会折叠成单行，显示0037站和2015-02-02日的2个人。

我试过，

result <- data_frame %>% group_by(station, week = week(date)) %>% summarise_each(funs(length), -date)

来源

2016-05-23 Cybernetic

'summarize_each'只有在必要的时候，你有你想要的总结多列，例如，如果你想要的四个不同列平均在车站/日期分组。 – Gregor

你可以尝试：

group_by(df, station, date) %>% summarise(num_people = length(person_id)) 
Source: local data frame [3 x 3] 
Groups: station [?] 

    station  date num_people 
    (int)  (fctr)  (int) 
1  37 2015-02-02   2 
2  37 2015-02-04   1 
3  37 2015-02-05   1

来源

2016-05-23 18:52:54 DatamineR

这不就是'count（df，station，date）'吗？或者至少'group_by（df，station，date）％>％summarize（n（））'？ –

优秀。谢谢。 – Cybernetic

在基础R，你可以使用aggregate：

# sample dataset 
set.seed(1234) 
df <- data.frame(station=sample(1:3, 50, replace=T), 
       person_id=sample(30000:35000, 50, replace=T), 
       date=sample(seq(as.Date("2015-02-05"), as.Date("2015-02-12") 
           by="day"), 50, replace=T)) 

# calculate number of people per station on a particular date 
aggregate(cbind("passengerCount"=person_id) ~ station + date, data=df, FUN=length)

的cbind功能是没有必要的，但它让你提供一个变量名称。

来源

2016-05-23 18:55:28 lmo

使用data.table，我们将'data.frame'转换为'data.table'，按'station'，'date'分组，我们得到行数（.N）。

library(data.table) 
setDT(df1)[, .(num_people = .N), .(station, date)] 
# station  date num_people 
#1:  37 2015-02-02   2 
#2:  37 2015-02-04   1 
#3:  37 2015-02-05   1

来源

2016-05-24 03:02:27 akrun

聚合数据帧，由发生

回答

相关问题