2016-04-17 44 views
1

我有包含以下内容的CSV文件:数据帧的日期时间值行填充

ts1<-read.table(header = TRUE, sep=",", text=" 
    start,   end,   value 
1,26/11/2014 13:00,26/11/2014 20:00,decreasing 
2,26/11/2014 20:00,27/11/2014 09:00,increasing ") 

我想上述dataframe转移到其中的每一行time列被打开,并与值填充在dataframe 。所述时间间隙从start时间填充到end时间 - 1(减去1),如下:

 date  hour  value 
1 26/11/2014 13:00 decreasing 
2 26/11/2014 14:00 decreasing 
3 26/11/2014 15:00 decreasing 
4 26/11/2014 16:00 decreasing 
5 26/11/2014 17:00 decreasing 
6 26/11/2014 18:00 decreasing 
7 26/11/2014 19:00 decreasing 
8 26/11/2014 20:00 increasing 
9 26/11/2014 21:00 increasing 
10 26/11/2014 22:00 increasing 
11 26/11/2014 23:00 increasing 
12 26/11/2014 00:00 increasing 
13 26/11/2014 01:00 increasing 
14 26/11/2014 02:00 increasing 
15 26/11/2014 03:00 increasing 
16 26/11/2014 04:00 increasing 
17 26/11/2014 05:00 increasing 
18 26/11/2014 06:00 increasing 
19 26/11/2014 07:00 increasing 
20 26/11/2014 08:00 increasing 

我试图启动与从所述日期分隔小时:

> t <- strftime(ts1$end, format="%H:%M:%S") 
> t 
[1] "00:00:00" "00:00:00" 

回答

1

我们可以使用data.table。将'data.frame'转换为'data.table'(setDT(ts1)),按行的顺序分组(1:nrow(ts1)),我们将'start'和'end'列转换为datetime类(使用dmy_hm from lubridate),获取序列by'1小时',format将结果转换为预期格式,然后按空格拆分(tstrsplit),与'值'列连接,通过分配NULL删除'rn'列。最后,我们可以更改列名称(如果需要)。

library(lubridate) 
library(data.table) 
res <- setDT(ts1)[,{st <- dmy_hm(start) 
        et <- dmy_hm(end) 
        c(tstrsplit(format(head(seq(st, et, by = "1 hour"),-1), 
          "%d/%m/%Y %H:%M"), "\\s+"), as.character(value))} , 
     by = .(rn=1:nrow(ts1)) 
    ][, rn := NULL][] 
setnames(res, c("date", "hour", "value"))[] 
#   date hour  value 
# 1: 26/11/2014 13:00 decreasing 
# 2: 26/11/2014 14:00 decreasing 
# 3: 26/11/2014 15:00 decreasing 
# 4: 26/11/2014 16:00 decreasing 
# 5: 26/11/2014 17:00 decreasing 
# 6: 26/11/2014 18:00 decreasing 
# 7: 26/11/2014 19:00 decreasing 
# 8: 26/11/2014 20:00 increasing 
# 9: 26/11/2014 21:00 increasing 
#10: 26/11/2014 22:00 increasing 
#11: 26/11/2014 23:00 increasing 
#12: 27/11/2014 00:00 increasing 
#13: 27/11/2014 01:00 increasing 
#14: 27/11/2014 02:00 increasing 
#15: 27/11/2014 03:00 increasing 
#16: 27/11/2014 04:00 increasing 
#17: 27/11/2014 05:00 increasing 
#18: 27/11/2014 06:00 increasing 
#19: 27/11/2014 07:00 increasing 
#20: 27/11/2014 08:00 increasing 
1

这是一个使用lubridate和plyr的解决方案。它处理数据的每一行以便从开始到结束进行一个序列,并返回该值。每行的结果合并为一个数据帧。如果您需要进一步处理结果,最好不要将日期时间分隔日期和时间

library(plyr) 
library(lubridate) 
ts1$start <- dmy_hm(ts1$start) 
ts1$end <- dmy_hm(ts1$end) 

adply(.data = ts1, .margin = 1, .fun = function(x){ 
    datetime <- seq(x$start, x$end, by = "hour") 
    #data.frame(datetime, value = x$value)" 
    data.frame(date = as.Date(datetime), time = format(datetime, "%H:%M"), value = x$value) 
})[, -(1:2)]