子集的数据帧根据一列中识别最大值和最小值（在R）

有关示例数据帧：子集的数据帧根据一列中识别最大值和最小值（在R）

df1 <- structure(list(id = 1:21, region = structure(c(1L, 1L, 1L, 1L, 
                2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
                4L), .Label = c("a", "b", "c", "d"), class = "factor"), weight = c(0.35, 
                                0.65, 0.99, 1.5, 3.2, 2.1, 1.3, 3.2, 1.3, 2, 0.6, 0.6, 0.6, 0.45, 
                                1, 1.2, 1.4, 2, 1.3, 1, 2), condition = c(0L, 1L, 0L, 1L, 0L, 
                                           0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L 
                                )), .Names = c("id", "region", "weight", "condition"), class = "data.frame", row.names = c(NA, 
                                                       -21L))

我希望排除不具有任一'1'的最高或最低编号的区域在地区结果变量。例如，我通常会做：

summary <- setDT(df)[,.(.result = weighted.mean((condition==1), 
     w = weight)*100), by = region]

这将使我：总结

region .result 
1:  a 61.60458 
2:  b 39.69466 
3:  c 50.56180 
4:  d 61.03896

所以我会从子集数据帧DF区c和d。

是否可以在一步完成此操作而无需手动查看摘要数据框？

来源

2016-02-12 KT_1

我的理解是，您希望排除所有不是最高和最低值的值。它不能作为一个班轮，但如果你添加以下内容，你应该得到你想要的：

incl <- summary[c(which.min(.result), which.max(.result)),region] 
newdf <- df1[region %in% incl,] 
newdf 

    id region weight condition 
1: 5  b 3.20   0 
2: 6  b 2.10   0 
3: 7  b 1.30   0 
4: 8  b 3.20   1 
5: 9  b 1.30   0 
6: 10  b 2.00   1 
7: 1  a 0.35   0 
8: 2  a 0.65   1 
9: 3  a 0.99   0 
10: 4  a 1.50   1

来源

2016-02-12 11:52:48

子集的数据帧根据一列中识别最大值和最小值（在R）

回答

相关问题