2016-05-14 89 views
0

让我有以下数据:对于每小时数据,获得最大的价值每天

time <- seq(ISOdate(2007,7,1,0), ISOdate(2008,4,5,23), by = "1 hour") 
y <- rnorm(n = length(time)) 

year <- as.numeric(substr((as.character(time)), 1, 4)) # year number as numeric 

month <- as.numeric(substr((as.character(time)), 6, 7)) # month number as numeric 

day <- as.numeric(substr((as.character(time)), 9, 10)) # day number as numeric 

hour <- as.numeric(substr((as.character(time)), 12, 13)) # hour number as numeric 

dat <- data.frame(year=year, month=month, day=day, hour=hour, y = y) 

每一天,有在每个小时(0〜23)24个y值。现在我必须每天最多找到y。也就是说,对于“2007-10-05”日期,在每个小时(0到23)中获得的值有24 y,我必须获得“2007-10-05”日的最大值。因此,从“2007-07-01”到“2008-04-05”之间有279天,因此我将获得279个最大值y值。

我该怎么做?

回答

3

使用dplyr

library(dplyr) 
dyp1 <- dat %>% 
     group_by(year, month, day) %>% 
     summarise(y=max(y)) 

使用data.table

library(data.table) 
setDT(dat)[, .(y=max(y)), by = .(year, month, day)] 

使用碱R

aggregate(y ~ year+month+day, dat, max) 
2

使用sqldf

library(sqldf) 
sqldf("select year, month, day, 
     max(y) as y 
     from dat 
     group by year, month, day") 

或者另一种选择是命令“Y”,并选择所述第一值

library(data.table) 
setDT(dat)[order(-y), .(y= y[1L]), by = .(year, month, day)] 

或用dplyr

library(dplyr) 
dat %>% 
    group_by(year, month, day) %>% 
    arrange(desc(y)) %>% 
    summarise(y = first(y)) 
1

直接应用剪切命令的时间和y阵列:

tapply(y, INDEX =cut(time, breaks="day"), max) 

或使用dplyr库:

library(dplyr) 
df<-data.frame(time, y) 
summarize(group_by(df, cut(df$time, breaks="day")), max(y))