2016-11-14 38 views
0

我想为数据框中的每个组添加一个排序的ID(按日期)。我能做到这一点使用dplyr(R - add column that counts sequentially within groups but repeats for duplicates):使用rleid为每个组添加按日期排序的ID

# Example data 
date <- rep(c("2016-10-06 11:56:00","2016-10-05 11:56:00","2016-10-05 11:56:00","2016-10-07 11:56:00"),2) 
date <- as.POSIXct(date) 
group <- c(rep("A",4), rep("B",4))  
df <- data.frame(group, date) 

# dplyr - dense_rank 
df2 <- df %>% group_by(group) %>% 
     mutate(m.test=dense_rank(date)) 

    group    date m.test 
    <fctr>    <dttm> <int> 
1  A 2016-10-06 11:56:00  2 
2  A 2016-10-05 11:56:00  1 
3  A 2016-10-05 11:56:00  1 
4  A 2016-10-07 11:56:00  3 
5  B 2016-10-06 11:56:00  2 
6  B 2016-10-05 11:56:00  1 
7  B 2016-10-05 11:56:00  1 
8  B 2016-10-07 11:56:00  3 

所以我的新列m.test每个group通过date行列。如果我使用rleiddata.table,它似乎并没有工作(05/10 06/10后排名):

df3 <- setDT(df)[, m.test := rleid(date), by = group] 

    group    date m.test 
1:  A 2016-10-06 11:56:00  1 
2:  A 2016-10-05 11:56:00  2 
3:  A 2016-10-05 11:56:00  2 
4:  A 2016-10-07 11:56:00  3 
5:  B 2016-10-06 11:56:00  1 
6:  B 2016-10-05 11:56:00  2 
7:  B 2016-10-05 11:56:00  2 
8:  B 2016-10-07 11:56:00  3 

上午我得到的语法错了吗?

+0

的'dplyr的'DENSE_RANK(...)的data.table'相当于''是弗兰克(.. 。,ties.method =“dense”)',afaik –

+0

谢谢。我从最初问我这个问题的答案变得困惑(http://stackoverflow.com/questions/37008864/add-id-by-group-which-resets-to-1-in-r)。我认为在这种情况下rleid不适用于日期。 – Pete900

+0

你想发布一个答案。 – Pete900

回答

1

感谢@docendo discimus,与data.table做到这一点,正确的做法是frank(..., ties.method = "dense")

df4 <- setDT(df)[, m.test := frank(date, ties.method = "dense"), by = group]