2014-07-08 119 views
-1

我有一个很大的数据集,类似于下面的可重现样本数据。在R中汇总每周水平数据到每周水平

Interval value 
1 2012-06-10 552 
2 2012-06-11 4850 
3 2012-06-12 4642 
4 2012-06-13 4132 
5 2012-06-14 4190 
6 2012-06-15 4186 
7 2012-06-16 1139 
8 2012-06-17 490 
9 2012-06-18 5156 
10 2012-06-19 4430 
11 2012-06-20 4447 
12 2012-06-21 4256 
13 2012-06-22 3856 
14 2012-06-23 1163 
15 2012-06-24 564 
16 2012-06-25 4866 
17 2012-06-26 4421 
18 2012-06-27 4206 
19 2012-06-28 4272 
20 2012-06-29 3993 
21 2012-06-30 1211 
22 2012-07-01 698 
23 2012-07-02 5770 
24 2012-07-03 5103 
25 2012-07-04 775 
26 2012-07-05 5140 
27 2012-07-06 4868 
28 2012-07-07 1225 
29 2012-07-08 671 
30 2012-07-09 5726 
31 2012-07-10 5176 

我想这汇总数据每周水平得到类似以下的输出:

Interval   value 
1 Week 2, June 2012 *aggregate value for day 10 to day 14 of June 2012* 
2 Week 3, June 2012 *aggregate value for day 15 to day 21 of June 2012* 
3 Week 4, June 2012 *aggregate value for day 22 to day 28 of June 2012* 
4 Week 5, June 2012 *aggregate value for day 29 to day 30 of June 2012* 
5 Week 1, July 2012 *aggregate value for day 1 to day 7 of July 2012* 
6 Week 2, July 2012 *aggregate value for day 8 to day 10 of July 2012* 

如何做到这一点很容易,而无需编写长码?

+0

您使用的[XTS]标签,但它并不像你有一个XTS对象。你说得对,虽然xts可能是最简单的方法。你有搜索吗?看看'to.weekly','apply.weekly','period.apply'和搜索SO。 – GSee

回答

2

如果您从lubridate开始使用week,那么您只需要五周的时间就可以转到by。假设dat是您的数据,

> library(lubridate) 
> do.call(rbind, by(dat$value, week(dat$Interval), summary)) 
# Min. 1st Qu. Median Mean 3rd Qu. Max. 
# 24 552 4146 4188 3759 4529 4850 
# 25 490 2498 4256 3396 4438 5156 
# 26 564 2578 4206 3355 4346 4866 
# 27 698  993 4868 3366 5122 5770 
# 28 671 1086 3200 3200 5314 5726 

这说明经过一年的28周为24日的总结。同样,当你说“汇总”的值,我们可以得到aggregate手段与

> aggregate(value~week(Interval), data = dat, mean) 
# week(Interval) value 
# 1    24 3758.667 
# 2    25 3396.286 
# 3    26 3355.000 
# 4    27 3366.429 
# 5    28 3199.500 
0

,你的意思是把他们的总和?比方说,你的数据帧d并假设d$IntervalDate类的,你可以尝试

# if d$Interval is not of class Date d$Interval <- as.Date(d$Interval) 
formatdate <- function(date) 
    paste0("Week ", as.numeric(format(date, "%d")) %/% 7 + 1, 
     ", ", format(date, "%b %Y")) 
# change "sum" to your required function 
aggregate(d$value, by = list(formatdate(d$Interval)), sum) 
#   Group.1  x 
# 1 Week 1, Jul 2012 3725.667 
# 2 Week 2, Jul 2012 3199.500 
# 3 Week 2, Jun 2012 3544.000 
# 4 Week 3, Jun 2012 3434.000 
# 5 Week 4, Jun 2012 3333.143 
# 6 Week 5, Jun 2012 3158.667 
10

如果按周意味着“价值”的总和,我认为这样做是为了将数据转换的最简单方法成XTS对象GSEE建议:

data <- as.xts(data$value,order.by=as.Date(data$interval)) 
weekly <- apply.weekly(data,sum) 

      [,1] 
2012-06-10 552 
2012-06-17 23629 
2012-06-24 23872 
2012-07-01 23667 
2012-07-08 23552 
2012-07-10 10902 

我离开的输出格式作为练习你:-)

+0

如何切换到ts()对象以便使用预测和分解? – gmeroni

+0

使用“as”方法:'as.ts(data)' – hvollmeier

1

如果您使用的数据帧,可以方便地与tidyquant做到这一点包。使用tq_transmute函数,该函数应用一个变异并返回一个新的数据帧。选择“值”列并应用xts功能apply.weekly。额外的参数FUN = sum将按周获取聚合。


library(tidyquant) 

df 
#> # A tibble: 31 x 2 
#>  Interval value 
#>  <date> <int> 
#> 1 2012-06-10 552 
#> 2 2012-06-11 4850 
#> 3 2012-06-12 4642 
#> 4 2012-06-13 4132 
#> 5 2012-06-14 4190 
#> 6 2012-06-15 4186 
#> 7 2012-06-16 1139 
#> 8 2012-06-17 490 
#> 9 2012-06-18 5156 
#> 10 2012-06-19 4430 
#> # ... with 21 more rows 

df %>% 
    tq_transmute(select  = value, 
       mutate_fun = apply.weekly, 
       FUN  = sum) 
#> # A tibble: 6 x 2 
#>  Interval value 
#>  <date> <int> 
#> 1 2012-06-10 552 
#> 2 2012-06-17 23629 
#> 3 2012-06-24 23872 
#> 4 2012-07-01 23667 
#> 5 2012-07-08 23552 
#> 6 2012-07-10 10902