我想通过几个不同的因素来总结数据集。以下是我的数据示例:按日期和组汇总数据框
household<-c("household1","household1","household1","household2","household2","household2","household3","household3","household3")
date<-c(sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 9))
value<-c(1:9)
type<-c("income","water","energy","income","water","energy","income","water","energy")
df<-data.frame(household,date,value,type)
household date value type
1 household1 1999-05-10 100 income
2 household1 1999-05-25 200 water
3 household1 1999-10-12 300 energy
4 household2 1999-02-02 400 income
5 household2 1999-08-20 500 water
6 household2 1999-02-19 600 energy
7 household3 1999-07-01 700 income
8 household3 1999-10-13 800 water
9 household3 1999-01-01 900 energy
我想按月总结数据。理想情况下,最终的数据集将有每户12行(每月一笔)和每个支出类别(水,能源,收入)的列,该列是该月总数的总和。
我试着从添加一个带有短日期的列开始,然后我要过滤每个类型,并为每个事务类型的总和数据创建一个单独的数据框。然后,我将把这些数据帧合并在一起以得到汇总的df。我试图使用ddply对其进行总结,但是它汇总得太多了,我无法保留家庭级别的信息。
ddply(df,.(shortdate),summarize,mean_value=mean(value))
shortdate mean_value
1 14/07 15.88235
2 14/09 5.00000
3 14/10 5.00000
4 14/11 21.81818
5 14/12 20.00000
6 15/01 10.00000
7 15/02 12.50000
8 15/04 5.00000
任何帮助将不胜感激!
是的,我只是懒惰,并没有输出完整的DF例 –
是的,理想情况下,我会有每行12行(除非你可以推荐更好的方式)。这匹配另一个df我从另一个来源 –