我有一个数据框,我想删除包含异常值的任何一周。如果我能将整个星期表示为异常值,我会很高兴,因为我知道如何从那里做子集。我一直无法提出适当的解决方案。我一直在想,我需要循环几个星期才能达到预期的目标,或者创建一个单独的函数来处理单独的异常周和使用补给。我还没有使这些解决方案中的任何一个都可行。根据多个条件填充列的元素
date <- seq(as.Date("2015-01-01"), length=365, by="1 day")
dow <- as.factor(weekdays(as.Date(date))
df <- data.frame(cbind(date, dow))
df$date <- as.Date(df$date,format="%m/%d/%Y",origin="01/01/1970")
df$dow <- as.factor(weekdays(as.Date(df$date)))
set.seed(1115)
df$var1 <- rnorm(365, 1912, 40795)
stdev <- sd(df$var1, na.rm=TRUE)
avg <- mean(df$var1, na.rm=TRUE)
df$LB <- avg-(2.75*stdev)
df$UB <- avg+(2.75*stdev)
df$outlier <- ifelse(df$var1<df$LB | df$var1>df$UB, 1,0)
df$weeknum <- as.numeric(format(df$date, "%U"))
head(df, 17)
> head(df, 17)
date dow var1 LB UB outlier weeknum
1 2015-01-01 Thursday -7828.412 -114675.6 120479.8 0 0
2 2015-01-02 Friday 25674.456 -114675.6 120479.8 0 0
3 2015-01-03 Saturday -33588.871 -114675.6 120479.8 0 0
4 2015-01-04 Sunday -54418.175 -114675.6 120479.8 0 1
5 2015-01-05 Monday -10002.002 -114675.6 120479.8 0 1
6 2015-01-06 Tuesday 34050.390 -114675.6 120479.8 0 1
7 2015-01-07 Wednesday -37584.648 -114675.6 120479.8 0 1
8 2015-01-08 Thursday 84048.878 -114675.6 120479.8 0 1
9 2015-01-09 Friday -24801.346 -114675.6 120479.8 0 1
10 2015-01-10 Saturday 33974.637 -114675.6 120479.8 0 1
11 2015-01-11 Sunday 77432.088 -114675.6 120479.8 0 2
12 2015-01-12 Monday 128196.236 -114675.6 120479.8 1 2
13 2015-01-13 Tuesday 9740.418 -114675.6 120479.8 0 2
14 2015-01-14 Wednesday 26539.887 -114675.6 120479.8 0 2
15 2015-01-15 Thursday 12172.834 -114675.6 120479.8 0 2
16 2015-01-16 Friday 1032.544 -114675.6 120479.8 0 2
17 2015-01-17 Saturday 76870.095 -114675.6 120479.8 0 2
在上面的例子中,期望的输出将是一个1与WEEKNUM对应每行中的异常值列= 2
像这样的'df [df $ weeknum == 2&df $ outlier == 1]''? – Jimbou
weeknum = 2应该是子集的唯一原因是异常发生在第12行的那一周。我想要创建的代码将在任何一周中找到异常点,并将整个一周的代码编码为异常值。数据集包含365行,因此上面的示例仅仅是前17行,恰好有一个异常值。 –