特定值的累积平均值

只有当值大于0时，我才想计算累积平均值。如果我有一个矢量：特定值的累积平均值

v <- c(1, 3, 0, 3, 2, 0)

平均将是9/6 = 1.5，但是我只希望当数值为> 0，所以在这种情况下，这将是9/4到取平均值= 2.25 。但是这个平均值超过了整个集合。随着数据集的建立和积累，我希望做到这一点。所以，最初它会是：

1+3/2, 1+3+0/2, 1+3+0+3/3, 1+3+0+3+2/4, 1+3+0+3+2+0/4

我的数据集是9,000行，它的增长。我可以让cumsum工作并计算累计总和，但不计算“成功”的累计平均值。

来源

2017-10-07 Kerry

dplyr包具有cummean功能。如果你只是想为> 0，为v>0 V中选择值：

v <- c(1, 3, 0, 3, 2, 0) 

dplyr::cummean(v[v>0]) 
#> [1] 1.000000 2.000000 2.333333 2.250000

如果你想重复的结果，你可以用食指和从动物园一个辅助功能发挥。

# Create a vector container for the result (here with NA values) 
v_res <- v[NA] 
# Fill cumsum where you want to calculate it (here v>0) 
v_res[v>0] <- dplyr::cummean(v[v>0]) 
# Fill the gap with previous value 
zoo::na.locf(v_res) 
#> [1] 1.000000 2.000000 2.000000 2.333333 2.250000 2.250000

它与负值的作品以V太

v <- c(1, 3, 0, 3, -5, 2, 0, -6) 
v_res <- v[NA] 
v_res[v>0] <- dplyr::cummean(v[v>0]) 
zoo::na.locf(v_res) 
#> [1] 1.000000 2.000000 2.000000 2.333333 2.333333 2.250000 2.250000 2.250000

你可以使用tidyverse了。如果您的数据位于data.frame中，则此解决方案可能非常有用。

library(dplyr, warn.conflicts = F) 
library(tidyr) 

data <- data_frame(v = c(1, 3, 0, 3, 2, 0)) %>% 
    tibble::rowid_to_column() 
res <- data %>% 
    filter(v > 0) %>% 
    mutate(cummean = cummean(v)) %>% 
    right_join(data, by = c("rowid", "v")) %>% 
    fill(cummean) 
res 
#> # A tibble: 6 x 3 
#> rowid  v cummean 
#> <int> <dbl> <dbl> 
#> 1  1  1 1.000000 
#> 2  2  3 2.000000 
#> 3  3  0 2.000000 
#> 4  4  3 2.333333 
#> 5  5  2 2.250000 
#> 6  6  0 2.250000 
pull(res, cummean)[-1] 
#> [1] 2.000000 2.000000 2.333333 2.250000 2.250000

来源

2017-10-07 06:25:56 cderv

OK我看到，但它是不是平均本身。 '1 + 3 + 0/2'是三个值的总和，所以它应该是三个数值。我会更新答案以符合预期的结果 – cderv

可以通过除以v的累积和与逻辑矢量v > 0的累积和解决这个问题：

v1 <- cumsum(v)/cumsum(v>0)

其给出：

> v1 
[1] 1.000000 2.000000 2.000000 2.333333 2.250000 2.250000

当你想省略的第一个值：

v2 <- (cumsum(v)/cumsum(v>0))[-1]

其给出：

> v2 
[1] 2.000000 2.000000 2.333333 2.250000 2.250000

后者是等于期望的结果如问题指定：

> ref <- c((1+3)/2, (1+3+0)/2, (1+3+0+3)/3, (1+3+0+3+2)/4, (1+3+0+3+2+0)/4) 
> identical(v2, ref) 
[1] TRUE

数据集中的实现：这给

# create an example dataset 
df <- data.frame(rn = letters[seq_along(v)], v) 

# calculate the 'succes-cummulative-mean' 
library(dplyr) 
df %>% 
    mutate(succes_cum_mean = cumsum(v)/cumsum(v>0))

：

rn v succes_cum_mean 
1 a 1  1.000000 
2 b 3  2.000000 
3 c 0  2.000000 
4 d 3  2.333333 
5 e 2  2.250000 
6 f 0  2.250000

来源

2017-10-07 07:06:12 Jaap

特定值的累积平均值

回答

相关问题