我有一个data.frame tr_id_cat由两列组成:id,category。R--快速计数数据帧行的方法,两列设置为特定值
- tr_id_cat有15万
- ID有30万个唯一值
- 类别有20个独特的价值观
我想建立一个计数矩阵:
我的代码远,但它需要年龄:
# id and categories are lists of the possible values (dictionary)
nb_id = length(id)
nb_categories = length(categories)
COUNT_ID_CATEGORY = array(0, dim=c(nb_id,nb_categories))
for(i in (1:nb_categories))
{
cat_ = categories[i]
subs = tr_id_cat[ tr_id_cat$category == cat_ ,]
for(j in(1:dim(subs)[1]))
{
id_ = subs$id[j]
id_idx = which(id == id_)
COUNT_ID_CATEGORY[id_idx,cat_idx] = dim(subs[ subs$id == id_,])[1];
}
}
小版的什么,我试图做的:
id, category
1, 1
1, 1
1, 1
1, 2
1, 2
2, 1
3, 1
将被转换成数矩阵:
COUNT_ID_CATEGORY[1,1] = 3 # first three lines
COUNT_ID_CATEGORY[1,2] = 2 # line 4 and 5
COUNT_ID_CATEGORY[2,1] = 1
COUNT_ID_CATEGORY[2,2] = 0
COUNT_ID_CATEGORY[3,1] = 1
COUNT_ID_CATEGORY[3,2] = 0
etc
你尝试dplyr或data.table方法?你能提供一个你的数据集和期望输出的最小例子吗? –