2014-07-23 154 views
3

包含日期,买入价值和卖出价值的表格。我想要统计每天的购买量和销售量,以及购买和销售的总数。我在data.table中发现这有点棘手。R data.table中的分组计数汇总

date buy sell  
2011-01-01 1 0 
2011-01-02 0 0 
2011-01-03 0 2 
2011-01-04 3 0 
2011-01-05 0 0 
2011-01-06 0 0 
2011-01-01 0 0 
2011-01-02 0 1 
2011-01-03 4 0 
2011-01-04 0 0 
2011-01-05 0 0 
2011-01-06 0 0 
2011-01-01 0 0 
2011-01-02 0 8 
2011-01-03 2 0 
2011-01-04 0 0 
2011-01-05 0 0 
2011-01-06 0 5 

以上data.table可以使用下面的代码来创建:

DT = data.table(
      date=rep(as.Date('2011-01-01')+0:5,3) , 
      buy=c(1,0,0,3,0,0,0,0,4,0,0,0,0,0,2,0,0,0), 
      sell=c(0,0,2,0,0,0,0,1,0,0,0,0,0,8,0,0,0,5)); 

我想要什么,结果是:

date total_buys total_sells 
2011-01-01 1   0 
2011-01-02 0   2 
       and so on 

而且我也想知道购买和销售总数:

total_buys total_sells 
    4   4 

我曾尝试:

length(DT[sell > 0 | buy > 0]) 
> 3 

这是一个奇怪的答案(想知道为什么)

回答

10
## by date 
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0)), by = date] 
##   date total_buys total_sells 
## 1: 2011-01-01   1   0 
## 2: 2011-01-02   0   2 
## 3: 2011-01-03   2   1 
## 4: 2011-01-04   1   0 
## 5: 2011-01-05   0   0 
## 6: 2011-01-06   0   1 

DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0))] 
## total_buys total_sells 
## 1:   4   4 
+0

总和增加了购买价值 - 我期待指望他们。总购买量和总销售量各有4个。 – user1480926

+0

@ user1480926更新了答案 –

+0

谢谢杰克你介意解释这是如何工作的?这是一个非常简洁的方法来做到这一点的荣誉。 – user1480926

3

的替代@杰克的回答是典型的melt + dcast常规,类似:

library(reshape2) 
dtL <- melt(DT, id.vars = "date") 
dcast.data.table(dtL, date ~ variable, value.var = "value", 
       fun.aggregate = function(x) sum(x > 0)) 
#   date buy sell 
# 1 2011-01-01 1 0 
# 2 2011-01-02 0 2 
# 3 2011-01-03 2 1 
# 4 2011-01-04 1 0 
# 5 2011-01-05 0 0 
# 6 2011-01-06 0 1 

,或在不熔化,只是:

DT[, lapply(.SD, function(x) sum(x > 0)), by = date] 

为了让您的其他表,尝试:

dtL[, list(count = sum(value > 0)), by = variable] 
# variable count 
# 1:  buy  4 
# 2:  sell  4 

,或在不熔化:

DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")] 
# buy sell 
# 1: 4 4 
+0

谢谢阿南达,这也很酷! – user1480926

+0

@ user1480926,我认为我会分享它,因为如果你的列数多于2,那么它会变得更加方便。 – A5C1D2H2I1M1N2O1R2T1