2017-08-30 54 views
1

我有这样一个数据帧的类型,计算类型计数,并添加用逗号分隔的data.table

ID <- c("ID001","ID001","ID001","ID002","ID002","ID002") 
ToolID <- c("SWP","SWP","SWP","ISP","ISP","ISP") 
Type <- c("A","B","C","D","E","A") 
WHEN <- c("2017-08-15 12:44:11","2017-08-15 12:44:11","2017-08-14 19:07:11", 
      "2017-08-17 11:24:15","2017-08-17 11:24:15","2017-08-17 11:24:15") 

df <- data.frame(ID,ToolID,Type,WHEN) 
df$WHEN <- as.POSIXct(df$WHEN,format="%Y-%m-%d %H:%M:%S") 

我试图把所有类型的用逗号分隔的一列,并且还计算计数对于ID,按照(Tool_ID & ID)进行分组,同时仅取最大值(WHEN),即相应ID的最近时间戳。

所需的输出

 ID ToolID Type Type_count    WHEN 
    ID001 SWP A,B   2 2017-08-15 12:44:11 
    ID002 ISP D,E,A   3 2017-08-17 11:24:15 

我尝试使用data.table而且做得这样

library(data.table) 
setDT(df)[, WHEN := as.POSIXct(WHEN)] 
df1 <- df[, max(WHEN), by = list(ID,ToolID)] 
colnames(df1)[which(names(df1) == "V1")] <- "WHEN" 

如何获得的类型和种类数增加DF1让我期望的输出? 有人能指出我在正确的方向吗?

回答

1

我们可以创建基于逻辑条件的rowIndex,然后通过使用组,在i指定索引,并获得摘要

i1 <- setDT(df)[, .I[WHEN == max(WHEN)], .(ID, ToolID)]$V1 
df[i1, .(Type = toString(unique(Type)), Type_count = uniqueN(Type), 
     WHEN = WHEN[1]), .(ID, ToolID)] 
#  ID ToolID Type Type_count    WHEN 
#1: ID001 SWP A, B   2 2017-08-15 12:44:11 
#2: ID002 ISP D, E, A   3 2017-08-17 11:24:15