2015-11-27 79 views
4

我有以下数据(DF)查找时间差

Id  Timestamp     Event 
1 2015-11-06 06:11:43   mail subscribed 
1 2015-11-06 06:15:43   Invoice created 
1 2015-11-06 09:15:43   phone call 
2 2015-11-07 08:15:43   New subscription 
2 2015-11-07 08:20:43   Added to customer list. 

我找下面(每一个ID时差)

例如,ID = 1有三个不同的时间不同的事件,我想计算基于Id的事件之间的各个时间之间的差异。

Id  Timestamp     Event     Time Difference(Mins) 
1 2015-11-06 06:11:43   mail subscribed   0.0 
1 2015-11-06 06:15:43   Invoice created   5.0   
1 2015-11-06 09:15:43   phone call    180.0 
2 2015-11-07 08:15:43   New subscription   0.0 
2 2015-11-07 08:20:43   Added to customer list 5.0 

我尝试下面的代码,

diff = function(x) as.numeric(x - lag(x)) 
or diff = function (x) as.numeric(0L,diff(x)) 
setDT(df)[, diff2 := diff(timestamp), by = Id] 

但这代码输出不规则的结果。任何帮助?

+0

不清楚。你想获得什么? – nicola

+0

也许你可以尝试'difftime'? – Jaap

+0

@nicola我更新了我的问题 – Maddy

回答

4

尝试ave。没有包被使用。

transform(df, Diff = ave(as.numeric(Timestamp), Id, FUN = function(x) c(0, diff(x))/60)) 

,并提供:

Id   Timestamp       Event Diff 
1 1 2015-11-06 06:11:43     mail subscribed 0 
2 1 2015-11-06 06:15:43     Invoice created 4 
3 1 2015-11-06 09:15:43      phone call 180 
4 2 2015-11-07 08:15:43     New subscription 0 
5 2 2015-11-07 08:20:43   Added to customer list 5 

注:这是用于输入data.frame,DF

Lines <- "Id,  Timestamp,     Event 
1, 2015-11-06 06:11:43,   mail subscribed 
1, 2015-11-06 06:15:43,   Invoice created 
1, 2015-11-06 09:15:43,   phone call 
2, 2015-11-07 08:15:43,   New subscription 
2, 2015-11-07 08:20:43,   Added to customer list" 

df <- read.csv(text = Lines) 
df$Timestamp <- as.POSIXct(df$Timestamp) 

修订按评论。

+0

看着期望的结果,似乎他希望'c(0,diff(x)) '而不是'xx [1]',但我可能是错的。 +1,我即将提出基于“ave”的解决方案。 – nicola

+0

G. Grothendieck Gro感谢解决方案和@nicola,帮助:-) – Maddy

4

您可以用包data.table做到这一点:

library(data.table) 
setDT(df)[, Diff := difftime(Timestamp, Timestamp[1], units="mins"), by=Id] 

df 
# Id   Timestamp     Event  Diff 
#1: 1 2015-11-06 06:11:43   mail subscribed 0 mins 
#2: 1 2015-11-06 06:15:43   Invoice created 4 mins 
#3: 1 2015-11-06 09:15:43    phone call 184 mins 
#4: 2 2015-11-07 08:15:43  New subscription 0 mins 
#5: 2 2015-11-07 08:20:43 Added to customer list. 5 mins 

编辑

按@Jaap评论,如果你需要的是连续的差异,你可以这样做:

df[, Diff2 := difftime(Timestamp, shift(Timestamp, 1L), units = "mins"), by = Id 
    ][is.na(Diff2), Diff2:=0] 

df 
# Id   Timestamp     Event  Diff Diff2 
#1: 1 2015-11-06 06:11:43   mail subscribed 0 mins 0 mins 
#2: 1 2015-11-06 06:15:43   Invoice created 4 mins 4 mins 
#3: 1 2015-11-06 09:15:43    phone call 184 mins 180 mins 
#4: 2 2015-11-07 08:15:43  New subscription 0 mins 0 mins 
#5: 2 2015-11-07 08:20:43 Added to customer list. 5 mins 5 mins 
+1

或者:'setDT(mydf)[,dff:= difftime(时间戳,移位(时间戳,1L),单位=“分钟数”), = Id]' – Jaap

+0

@Jaap的确,如果期望的结果更多'c(0,diff(x))',谢谢:-) – Cath