2017-05-15 62 views
0

我有这样一个数据帧:按学年如何按学年分组?

data.frame(
     date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
     16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
     n= 1:14 
    ) 

我如何可以总结n?每个学年都应该从十二月到八月。例如,我想在每个学年总结n。手动重构不是一个选项,因为值太多,有时甚至缺少值。

最终,重构应该是这样的:

date   a.y. 

"2012-05-01" 2011/2012 
"2012-08-01" 2011/2012 

"2012-12-01" 2012/2013 
"2013-05-01" 2012/2013 
"2013-08-01" 2012/2013 

"2013-12-01" 2013/2014 
"2014-05-01" 2013/2014 

"2014-12-01" 2014/2015 
"2015-05-01" 2014/2015 
"2015-08-01" 2014/2015 

"2015-12-01" 2015/2016 
"2016-05-01" 2015/2016 
"2016-08-01" 2015/2016 

"2016-12-01" 2016/2017 

正如你可以看到,日期遵循类似的模式,但每学年可能有不同数量的日期。

+0

我不明白yoru重构输出。 n在哪里?你不想每学年只有一行作最后的输出吗? – Kristofersen

+1

此外,不应该是2016/2017的最大值?这与a.y的其余部分一致。 – Kristofersen

+0

@Kristofersen谢谢,它似乎不够清楚哪些日期会对应于哪个学年,而伪输出只是表明这一点。另外,对于每个学年,是否有重复的行具有相同的'n'值而不是唯一的行,这与我一样。 – Dambo

回答

1

如果我在看到12月份的录入条目后立即阅读此权限,我们会更改学年。如果这是真的,那么下面的代码将起作用。

library(data.table) 
library(lubridate) 
df = data.frame(
    date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
        16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
    n= 1:14 
) 

df$AcademicYear = cumsum(month(df$date) == 12) 
setDT(df) 
df[ , .(Sum = sum(n)), by = .(AcademicYear)] 

    AcademicYear Sum 
1:   0 3 
2:   1 12 
3:   2 13 
4:   3 27 
5:   4 36 
6:   5 14 

编辑

的重构,你可以做这样的事情。它由AcademicYear寻找一个月,然后根据月份,它知道增加或减去一年并粘贴在一起。然后,该列只需要重新命名并如上所述进行求和。

df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"), 
           ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"), 
             paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)] 

> df 
      date n AcademicYear AcademicYear2 
1: 2012-05-01 1   0  2011/2012 
2: 2012-08-01 2   0  2011/2012 
3: 2012-12-01 3   1  2012/2013 
4: 2013-05-01 4   1  2012/2013 
5: 2013-08-01 5   1  2012/2013 
6: 2013-12-01 6   2  2013/2014 
7: 2014-05-01 7   2  2013/2014 
8: 2014-12-01 8   3  2014/2015 
9: 2015-05-01 9   3  2014/2015 
10: 2015-08-01 10   3  2014/2015 
11: 2015-12-01 11   4  2015/2016 
12: 2016-05-01 12   4  2015/2016 
13: 2016-08-01 13   4  2015/2016 
14: 2016-12-01 14   5  2016/2017 

编辑2

决定把所有的代码放在一起。这应该让你找到你想要的最终结果。

library(data.table) 
library(lubridate) 
df = data.frame(
    date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
        16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
    n= 1:14 
) 

setDT(df) 
df$AcademicYear = cumsum(month(df$date) == 12) 

df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"), 
           ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"), 
             paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)] 


df = df[ , .(Sum = sum(n)), by = .(AcademicYear = AcademicYear2)] 

> df 
    AcademicYear Sum 
1: 2011/2012 3 
2: 2012/2013 12 
3: 2013/2014 13 
4: 2014/2015 27 
5: 2015/2016 36 
6: 2016/2017 14 
0

不确定你想要什么条件与什么日期,但你可以使用dplyr和mutate与一系列if else语句。它很慢,但它的工作原理。

df <- data.frame(
    date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
        16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
    n= 1:14 
) 

df <- mutate(df, term=ifelse(date >= as.Date("2012-05-01") & date <= as.Date("2012-08-01"), "1", 
     ifelse(date >= as.Date("2012-12-01") & date <= as.Date("2013-05-01"), "2", 
      ifelse(date >= as.Date("2013-12-01") & date <= as.Date("2014-12-01"), "3", 
     ifelse(date >= as.Date("2015-08-01") & date <= as.Date("2016-08-01"), "4", 
      "other")))))