2017-06-24 39 views
1

为了获得两组得分的绝对偏差,我通常需要在R中编写长码,如下所示。从两组得分的均值中获得绝对偏差

问题

我在想,如果它可能会在基础R有可能以某种方式Vectorizemad()功能,以便从平均分值为我展示下面可能是例如每组得分的绝对偏差获得使用Vectorized版本的mad()?任何其他可行的想法是高度赞赏?

set.seed(0) 
    y = as.vector(unlist(mapply(FUN = rnorm, n = c(10, 10)))) # Produces two sets of scores 
groups = factor(rep(1:2, times = c(10, 10)))    # Grouping ID variable 

G1 = y[groups == 1]    # subset y scores for group 1 
G2 = y[groups == 2]    # subset y scores for group 2 
G1.abs.dev = abs(G1 - mean(G1)) # absolute deviation from mean scores for group 1 
G2.abs.dev = abs(G2 - mean(G2)) # absolute deviation from mean scores for group 2 

回答

2

如何

score <- lapply(split(y, groups), FUN = function (u) abs(u - mean(u))) 

score <- ave(y, groups, FUN = function (u) abs(u - mean(u))) 

结果以不同的方式进行组织。选择一个对你来说最舒服的人。


你的措辞有问题。 mad为数据返回单个统计信息/值。例如,

sapply(split(y, groups), mad) 

你是不是向量化mad,而是简单地计算每个数据为您的示例代码显示了偏差。

+0

您好,我想知道,如果你可能知道答案[*这个有趣的问题*](https://stackoverflow.com/questions/47857624/r-function-that-uses-its-输出作为其通自有输入多次)? – Reza

1

如果你把所有东西都放在data.frame中,它会更干净。在基础R,

set.seed(0) 

df <- data.frame(y = rnorm(20), 
       group = rep(1:2, each = 10)) 

df$abs_dev <- with(df, ave(y, group, FUN = function(x){abs(mean(x) - x)})) 

df 
#>    y group abs_dev 
#> 1 1.262954285  1 0.90403032 
#> 2 -0.326233361  1 0.68515732 
#> 3 1.329799263  1 0.97087530 
#> 4 1.272429321  1 0.91350536 
#> 5 0.414641434  1 0.05571747 
#> 6 -1.539950042  1 1.89887401 
#> 7 -0.928567035  1 1.28749100 
#> 8 -0.294720447  1 0.65364441 
#> 9 -0.005767173  1 0.36469114 
#> 10 2.404653389  1 2.04572943 
#> 11 0.763593461  2 1.12607477 
#> 12 -0.799009249  2 0.43652794 
#> 13 -1.147657009  2 0.78517570 
#> 14 -0.289461574  2 0.07301974 
#> 15 -0.299215118  2 0.06326619 
#> 16 -0.411510833  2 0.04902952 
#> 17 0.252223448  2 0.61470476 
#> 18 -0.891921127  2 0.52943981 
#> 19 0.435683299  2 0.79816461 
#> 20 -1.237538422  2 0.87505711 

或dplyr,

library(dplyr) 
set.seed(0) 

df <- data_frame(y = rnorm(20), 
       group = rep(1:2, each = 10)) 

df <- df %>% group_by(group) %>% mutate(abs_dev = abs(mean(y) - y)) 

df 
#> # A tibble: 20 x 3 
#> # Groups: group [2] 
#>    y group abs_dev 
#>   <dbl> <int>  <dbl> 
#> 1 1.262954285  1 0.90403032 
#> 2 -0.326233361  1 0.68515732 
#> 3 1.329799263  1 0.97087530 
#> 4 1.272429321  1 0.91350536 
#> 5 0.414641434  1 0.05571747 
#> 6 -1.539950042  1 1.89887401 
#> 7 -0.928567035  1 1.28749100 
#> 8 -0.294720447  1 0.65364441 
#> 9 -0.005767173  1 0.36469114 
#> 10 2.404653389  1 2.04572943 
#> 11 0.763593461  2 1.12607477 
#> 12 -0.799009249  2 0.43652794 
#> 13 -1.147657009  2 0.78517570 
#> 14 -0.289461574  2 0.07301974 
#> 15 -0.299215118  2 0.06326619 
#> 16 -0.411510833  2 0.04902952 
#> 17 0.252223448  2 0.61470476 
#> 18 -0.891921127  2 0.52943981 
#> 19 0.435683299  2 0.79816461 
#> 20 -1.237538422  2 0.87505711 

或data.table:

library(data.table) 
set.seed(0) 

dt <- data.table(y = rnorm(20), 
       group = rep(1:2, each = 10)) 

dt[, abs_dev := abs(mean(y) - y), by = group][] 
#>    y group abs_dev 
#> 1: 1.262954285  1 0.90403032 
#> 2: -0.326233361  1 0.68515732 
#> 3: 1.329799263  1 0.97087530 
#> 4: 1.272429321  1 0.91350536 
#> 5: 0.414641434  1 0.05571747 
#> 6: -1.539950042  1 1.89887401 
#> 7: -0.928567035  1 1.28749100 
#> 8: -0.294720447  1 0.65364441 
#> 9: -0.005767173  1 0.36469114 
#> 10: 2.404653389  1 2.04572943 
#> 11: 0.763593461  2 1.12607477 
#> 12: -0.799009249  2 0.43652794 
#> 13: -1.147657009  2 0.78517570 
#> 14: -0.289461574  2 0.07301974 
#> 15: -0.299215118  2 0.06326619 
#> 16: -0.411510833  2 0.04902952 
#> 17: 0.252223448  2 0.61470476 
#> 18: -0.891921127  2 0.52943981 
#> 19: 0.435683299  2 0.79816461 
#> 20: -1.237538422  2 0.87505711