2013-01-21 60 views
5

我需要从这个如何计算从组开始日期开始计算的天数?

id | date 
----------------- 
    A | 2000-01-13 
    A | 2000-01-18 
    A | 2000-01-25 
    B | 2012-10-10 
    B | 2012-10-11 
    C | 2005-07-25 
    C | 2005-07-31 

去这个

id | date  | days from start 
--------------------------- 
    A | 2000-01-13 | 0 
    A | 2000-01-18 | 5 
    A | 2000-01-25 | 12 
    A | 2000-02-08 | 26 
    B | 2012-10-10 | 0 
    B | 2012-10-11 | 1 
    C | 2005-07-25 | 0 
    C | 2005-07-31 | 6 

即创建一个拥有自第一日起经过的天数的变量,通过ID分组。

任何想法?

回答

8

使用data.table:(我假设date列是字符在这里。如果它date格式,那么你可以删除as.Date(.)函数调用

df <- structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), 
      date = c("2000-01-13", "2000-01-18", "2000-01-25", "2012-10-10", 
        "2012-10-11", "2005-07-25", "2005-07-31")), 
      .Names = c("id", "date"), row.names = c(NA, -7L), 
      class = "data.frame") 
require(data.table) 
dt <- data.table(df, key="id") 
dt[, days_from_start := cumsum(c(0, diff(as.Date(date)))),by=id] 

# id  date days_from_start 
# 1: A 2000-01-13    0 
# 2: A 2000-01-18    5 
# 3: A 2000-01-25    12 
# 4: B 2012-10-10    0 
# 5: B 2012-10-11    1 
# 6: C 2005-07-25    0 
# 7: C 2005-07-31    6 
+2

刚想张贴同样的事情! – A5C1D2H2I1M1N2O1R2T1

+0

所以,当我尝试你的解决方案时,我得到一个错误,说'结合:=在j中与尚未实现。请让维护者('data.table')知道您是否对此感兴趣。“是否因为我的R版本太旧(2.14.2)或者我的包版本'data.table'(1.8.0)? – plannapus

5

您还可以使用的功能difftimesplit组合。:

dat 
    id  date 
1 A 2000-01-13 
2 A 2000-01-18 
3 A 2000-01-25 
4 B 2012-10-10 
5 B 2012-10-11 
6 C 2005-07-25 
7 C 2005-07-31 

dat$date <- as.POSIXct(dat$date) 
dat$"Days spent" <- unlist(lapply(split(dat,f=dat$id), 
         function(x){as.numeric(difftime(x$date,x$date[1], units="days"))})) 
dat 
    id  date Days spent 
1 A 2000-01-13   0 
2 A 2000-01-18   5 
3 A 2000-01-25   12 
4 B 2012-10-10   0 
5 B 2012-10-11   1 
6 C 2005-07-25   0 
7 C 2005-07-31   6 

继@agstudy和@Arun建议,这可以如下简化:

其他
dat$"Days spent" <- unlist(by(dat, dat$id, 
          function(x)difftime(x$date,x$date[1], units= "days"))) 
+1

我在这里使用了'difftime',因为我不想要一个滞后的差异,但是每个元素和第一个元素之间的差异。否则''diff'确实非常适合日期(据我所见,无论如何)。 – plannapus

+1

我可能会建议将lapply + split =替换为... – agstudy

0

两种方法:ave并使用plyr库:

df <- 
structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), date = structure(c(10969, 
10974, 10981, 15623, 15624, 12989, 12995), class = "Date")), .Names = c("id", 
"date"), row.names = c(NA, -7L), class = "data.frame") 

使用ave,日期必须改变,以数字

df$days_from_start <- ave(as.numeric(df$date), df$id, FUN = function(x) x-min(x)) 

这给

> df 
    id  date days_from_start 
1 A 2000-01-13    0 
2 A 2000-01-18    5 
3 A 2000-01-25    12 
4 B 2012-10-10    0 
5 B 2012-10-11    1 
6 C 2005-07-25    0 
7 C 2005-07-31    6 
> str(df) 
'data.frame': 7 obs. of 3 variables: 
$ id    : chr "A" "A" "A" "B" ... 
$ date   : Date, format: "2000-01-13" ... 
$ days_from_start: num 0 5 12 0 1 0 6 

我们荷兰国际集团的plyr库:

library("plyr") 
df <- ddply(df, .(id), mutate, days_from_start = date - min(date)) 

这给

> df 
    id  date days_from_start 
1 A 2000-01-13   0 days 
2 A 2000-01-18   5 days 
3 A 2000-01-25   12 days 
4 B 2012-10-10   0 days 
5 B 2012-10-11   1 days 
6 C 2005-07-25   0 days 
7 C 2005-07-31   6 days 
> str(df) 
'data.frame': 7 obs. of 3 variables: 
$ id    : chr "A" "A" "A" "B" ... 
$ date   : Date, format: "2000-01-13" ... 
$ days_from_start:Class 'difftime' atomic [1:7] 0 5 12 0 1 0 6 
    .. ..- attr(*, "units")= chr "days" 
相关问题