plyr计算相对aggregration

我有data.frame，看起来像这样：plyr计算相对aggregration

> head(activity_data) 
ev_id cust_id active previous_active start_date 
1 1141880  201  1    0 2008-08-17 
2 4927803  201  1    0 2013-03-17 
3 1141880  244  1    0 2008-08-17 
4 2391524  244  1    0 2011-02-05 
5 1141868  325  1    0 2008-08-16 
6 1141872  325  1    0 2008-08-16

每个CUST_ID
- 每个EV_ID
  - 创建新变量$ recent_active（=所有具有此cust_id的行中的$ $有效，其中$ s tart_date> [this_row] $起始日期 - 10）

我奋力这个使用ddply，因为我的分裂分组是做（CUST_ID），我想回用CUST_ID和EV_ID行。

这里是我试过

ddply(activity_data, .(cust_id), function(x) recent_active=sum(x[this_row,]$active))

如果ddply是不是一种选择，做什么其他effieicent方式您推荐。我的数据集有大约2亿行，我需要每行大约10-15次。

样本数据是here

来源

2013-08-22 eamo

我recoomand使用'data.table '，你能给我们一个可重复的例子，所以我们可以写出实际数据的答案吗？ – statquant

在'$ start_date> [this_row] $ start_date - 10）'10是什么？ 10天还是10个月或10年？请输入样品数据。 – Metrics

dput为子集。结构（列表（ev_id = c（1144095L，4930018L，1144095L，2393739L， 1144083L，1144087L，1144099L，1144101L，1190816L，1190818L）， cust_id = c（201L，201L，244L，244L，325L，325L，325L，325L ， 325L，325L），活性= C（1L，1L，1L，1L，1L，1L，1L，1L，1L， 1L），previous_active = C（0，0，0，0，0，0，0 ，0，0，0），起始日期=结构（C（14334，16007 ，14334，15236，14333，14333，14333，14333，14340，14341 ）中，class = “日期”）），.Names = C（ “ev_id”，“cust_id”，“active”， “previous_active”，“start_date”），row.names = c（NA，10L），class =“data.frame”） – eamo

实际上，你需要在这里用两步法（同时还需要日期转换成日期格式，使用下面的代码之前）

ddply(activity_date, .(cust_id), transform, recent_active=your function) #Not clear what you are asking regarding the function 

ddply(activity_date, .(cust_id,ev_id), summarize,recent_active=sum(recent_active))

来源

2013-08-22 14:26:29 Metrics

plyr计算相对aggregration

回答

相关问题