0
我有(片断)这种格式的数据:解圈的R代码重叠的时间间隔计算
SW_Release deviceType configStartDate configEndDate
1: 04.05.00 21 2005-11-03 19:12:36 2006-02-28 10:19:27
2: 04.05.00 16 2005-11-04 03:59:05 2006-02-28 10:19:27
3: 04.05.00 20 2005-11-04 03:59:06 2006-02-28 10:19:27
4: 04.05.00 15 2005-11-04 03:59:06 2006-02-28 10:19:27
5: 04.05.00 19 2005-11-04 03:59:06 2006-02-28 10:19:27
6: 04.05.00 17 2005-11-04 03:59:06 2006-02-28 10:19:27
7: 04.07.03 16 2006-02-28 10:19:27 2006-03-29 01:00:39
8: 04.07.03 20 2006-02-28 10:19:27 2006-03-29 01:00:41
9: 04.07.01 15 2006-02-28 10:19:27 2006-03-29 01:00:41
10: 04.07.01 19 2006-02-28 10:19:27 2006-03-29 01:00:41
11: 04.07.01 17 2006-02-28 10:19:27 2006-03-29 01:00:42
12: 04.07.01 21 2006-02-28 10:19:27 2006-03-29 01:00:42
13: 04.07.01 18 2006-02-28 10:19:27 2006-03-29 01:00:42
14: 04.07.04 16 2006-03-29 01:00:40 2006-05-01 16:07:49
15: 04.07.04 20 2006-03-29 01:00:41 2006-05-01 16:07:50
16: 04.07.02 15 2006-03-29 01:00:41 2006-05-01 16:07:50
17: 04.07.02 19 2006-03-29 01:00:41 2006-05-01 16:07:51
18: 04.07.02 17 2006-03-29 01:00:42 2006-05-01 16:07:51
19: 04.07.02 21 2006-03-29 01:00:42 2006-05-01 16:07:51
20: 04.07.02 18 2006-03-29 01:00:42 2006-06-01 09:45:36
21: 04.07.04 16 2006-05-02 09:47:57 2006-06-01 09:45:25
22: 04.07.04 20 2006-05-02 09:47:57 2006-06-01 09:45:28
23: 04.07.02 15 2006-05-02 09:47:58 2006-06-01 09:45:31
24: 04.07.02 19 2006-05-02 09:47:58 2006-06-01 09:45:32
25: 04.07.02 17 2006-05-02 09:47:58 2006-06-01 09:45:34
26: 04.07.02 21 2006-05-02 09:47:58 2006-06-01 09:45:35
27: 04.07.05 16 2006-06-01 09:45:27 2006-08-14 17:54:15
28: 04.07.05 20 2006-06-01 09:45:29 2006-08-14 17:54:15
29: 04.07.06 15 2006-06-01 09:45:31 2007-12-12 11:03:00
30: 04.07.06 19 2006-06-01 09:45:33 2007-12-12 11:03:00
31: 04.07.03 17 2006-06-01 09:45:35 2006-08-14 17:54:16
32: 04.07.03 21 2006-06-01 09:45:35 2006-08-14 17:54:16
33: 04.07.04 18 2006-06-01 09:45:37 2007-12-12 11:03:00
34: 04.07.06 16 2006-08-14 17:54:15 2007-12-12 11:02:59
35: 04.07.06 20 2006-08-14 17:54:15 2007-12-12 11:02:59
36: 04.07.04 17 2006-08-14 17:54:16 2007-12-12 11:03:00
37: 04.07.04 21 2006-08-14 17:54:16 2007-12-12 11:03:00
38: 04.05.12 14 2011-06-17 15:40:13 2012-05-24 11:43:24
我需要添加了所有的间隔(间第二到最后一个和最后一列),但如您所见,某些行具有重叠或部分重叠的间隔。
之前,我添加了所有的日子里,我需要完整的数据集(从上面的代码中来)转换成类似:
accumulated data:
configStartDate configEndDate
1: 2005-11-03 19:12:36 2007-12-12 11:03:00
2: 2011-06-17 15:40:13 2012-05-24 11:43:24
total days: 934.296
下面是这样做我的R代码里面(它必须是R,虽然我正在考虑重新写在C++和使用RCPP):
merge_intervals <- function(interval_dt){
interval_dt <- interval_dt[order(configStartDate), list(configStartDate, configEndDate)]
new_dt <- interval_dt[1, list(configStartDate, configEndDate)]
for (i in 2:dim(interval_dt)[1]) {
buff <- interval_dt[i, list(configStartDate, configEndDate)]
if (new_dt[dim(new_dt)[1], configEndDate] >= buff[, configStartDate]){
if(new_dt[dim(new_dt)[1], configEndDate] >= buff[, configEndDate]){
next
}
else{
new_dt[dim(new_dt)[1], configEndDate := buff[, configEndDate]]
}
}
else {
new_dt <- rbind(new_dt, buff)
}
}
return(new_dt)
}
现在整件事花费约0.16秒,(与其他计算)上运行,但是,对于3000个独特的资产,创建计算时间开销8分钟。
如何将for
循环转换成更快的东西来减少计算时间?谢谢!
应该可以做矢量化。你想如何处理重叠的时间间隔?忽略重叠或将间隔合并成一个新的间隔,只考虑新的间隔? – Thierry
对不起,但您的示例并未向我明确说明您要执行的操作。你如何从你在第一个街区显示的10个街区(全部在2006年)到第二个街区的两个街区(跨度为2005-2012)?你能准确地描述如何从样本输入到预期输出? – josliber
我编辑了样本以包含所有行以使其更清晰。 –