我想重新编写下面稍微复杂的plyr命令,以便它更快,并使用聚合,tapply或data.table。用集合重写复杂的ddply
该功能允许您输入多个ID变量并测量变量,然后返回几个计算结果。但是,在更大的数据集上,它可能不是最有效的。
这里是代码...
require(ggplot2) # to get the diamonds data set
require(plyr)
mean_sd_for_several_variables <- function(df, lvls, measures) {
res <- ddply(df, lvls, function(x) {
ret <- vector()
for(measure in measures) {
mean_sd <- c(mean(x[,measure]), sd(x[,measure]))
names(mean_sd) <- c(paste0("mean_", measure), paste0("sd_", measure))
ret <- c(ret, mean_sd)
}
return(ret)
}
)
print(res)
}
...返回:
mean_sd_for_several_variables(diamonds, c("color", "cut"), c("price","depth"))
color cut mean_price sd_price mean_depth sd_depth
1 D Fair 4291.1 3286.1 64.048 3.29220
2 D Good 3405.4 3175.1 62.366 2.22240
3 D Very Good 3470.5 3523.8 61.750 1.46223
4 D Premium 3631.3 3711.6 61.169 1.15806
5 D Ideal 2629.1 3001.1 61.678 0.71201
6 E Fair 3682.3 2976.7 63.320 4.42103
7 E Good 3423.6 3330.7 62.204 2.23059
8 E Very Good 3214.7 3408.0 61.730 1.42377
9 E Premium 3538.9 3795.0 61.176 1.16454
10 E Ideal 2597.6 2956.0 61.687 0.70718
11 F Fair 3827.0 3223.3 63.508 3.70209
12 F Good 3495.8 3202.4 62.202 2.23976
13 F Very Good 3778.8 3786.1 61.722 1.38939
14 F Premium 4324.9 4012.0 61.260 1.16775
15 F Ideal 3374.9 3766.6 61.676 0.69398
16 G Fair 4239.3 3609.6 64.340 3.57340
17 G Good 4123.5 3702.5 62.527 2.03893
18 G Very Good 3872.8 3861.4 61.841 1.33169
19 G Premium 4500.7 4356.6 61.279 1.15341
20 G Ideal 3720.7 4006.3 61.700 0.68714
21 H Fair 5135.7 3886.5 64.585 3.14173
22 H Good 4276.3 4020.7 62.500 2.09212
23 H Very Good 4535.4 4185.8 61.968 1.31895
24 H Premium 5216.7 4466.2 61.322 1.15164
25 H Ideal 3889.3 4013.4 61.733 0.72939
26 I Fair 4685.4 3730.3 64.221 3.68771
27 I Good 5078.5 4631.7 62.475 2.17958
28 I Very Good 5255.9 4687.1 61.935 1.32890
29 I Premium 5946.2 5053.7 61.329 1.15338
30 I Ideal 4452.0 4505.2 61.794 0.72334
31 J Fair 4975.7 4050.5 64.357 3.31595
32 J Good 4574.2 3707.8 62.396 2.12091
33 J Very Good 5103.5 4135.7 61.902 1.33679
34 J Premium 6294.6 4788.9 61.390 1.13989
35 J Ideal 4918.2 4476.2 61.822 0.94669
嗯,没有太多的时间正确的认识,只是觉得我做了我的gateveys包类似的东西。 'calcShares'函数在这里:https://github.com/mbannert/gateveys/blob/master/R/gateveys.R以类似的方式使用data.table。不知道这是否对你更直观,但至少是data.table。 –