我有以下数据表DATA1(多行和多的变化,但,这是一个样品):萨姆列值
item cat1 cat2 cat3 amounts
1: 1 99 9999 9990 100
2: 2 12 8199 9990 100
3: 3 12 8199 9990 100
4: 4 12 8199 9990 100
5: 5 12 8199 9990 100
6: 6 12 8199 9990 100
7: 7 12 8199 9990 100
8: 8 12 4129 9990 100
9: 9 12 8199 9990 100
10: 10 12 8199 9990 100
library(data.table)
data1 <- setDT(structure(list(item = 1:10, cat1 = c("99", "12", "12", "12",
"12", "12", "12", "12", "12", "12"), cat2 = c("9999", "8199",
"8199", "8199", "8199", "8199", "8199", "4129", "8199", "8199"
), cat3 = c("9990", "9990", "9990", "9990", "9990", "9990", "9990",
"9990", "9990", "9990"), amounts = c("100", "100", "100", "100",
"100", "100", "100", "100", "100", "100")), .Names = c("item",
"cat1", "cat2", "cat3", "amounts"), class = c("data.table", "data.frame"
), row.names = c(NA, -10L)))
最初我想获得有关的一些信息符合cat1,cat2,cat3标准的行。所以我做了这样的事情:
data1[, .( items = .N,
group1 = sum(grepl("^[1-8]{2}$", cat1)),
group2 = sum(grepl("^[1-8]9$", cat1)),
group3 = sum(grepl("^9[1-8]$", cat1)),
group4 = sum(cat1 == "99"))]
并将结果:
items group1 group2 group3 group4
1: 10 9 0 0 1
有很多包含在分析的其他标准,但这样也只是一个样本。我的要求发生了变化,现在对于指定的每个组我都需要总结金额。 所以我有两个问题:
1)是否有数据表的方式来做到这一点求和以类似的方式对一个计算计数(所以基本想法是像sum(amounts)
其中grepl("^[1-8]{2}$", cat1)
)
2)有没有这样做的有效方式,我错过了?我想不出有什么好的方法可以让我的结果除了为每个我有的标准添加新的列到原始数据集,然后进行过滤总和。
我理想中的结果将是:
items group1 group2 group3 group4 total_amounts group1_amounts group2_amounts group3_amounts group4_amounts
1: 10 9 0 0 1 1000 900 0 0 100
非常感谢! – User2321