我想计算两个日期之间的变量的均值,下面是可重现的数据帧。如何计算两个日期之间的变量的均值
year <- c(1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,
1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997)
month <- c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC")
station <- c("A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B")
concentration <- as.numeric(round(runif(48,20,40),1))
df <- data.frame(year,month,station,concentration)
id <- c(1,2,3,4)
station1996 <- c("A","A","B","B")
station1997 <- c("B","A","A","B")
start <- c("06/01/1996","07/01/1996","07/01/1996","08/01/1996")
end <- c("04/01/1997","04/01/1997","04/01/1997","05/01/1997")
participant <- data.frame(id,station1996,station1997,start,end)
participant$start <- as.Date(participant$start, format = "%m/%d/%Y")
participant$end <- as.Date(participant$end, format = "%m/%d/%Y")
所以我有两个数据集,如下
df
year month station concentration
1 1996 JAN A 24.4
2 1996 FEB A 37.0
3 1996 MAR A 39.5
4 1996 APR A 28.0
...
45 1997 SEP B 37.7
46 1997 OCT B 35.2
47 1997 NOV B 26.8
48 1997 DEC B 40.0
participant
id station1996 station1997 start end
1 1 A B 1996-06-01 1997-04-01
2 2 A A 1996-07-01 1997-04-01
3 3 B A 1996-07-01 1997-04-01
4 4 B B 1996-08-01 1997-05-01
每个ID,我想计算开始和结束日期(月日)的平均浓度。注意到电台可能会在几年之间发生变化。
例如对于id = 1,我想计算1996年6月到1997年4月的平均浓度。这应该基于1996年6月至1996年12月在A站的浓度以及1997年1月至1997年4月的浓度台B.
任何人都可以帮忙吗?
非常感谢。
第1步:将'start'和'end'转换为'Date'或'POSIXct'格式,并将'year'和'month'作为同一格式的新列。 – MichaelChirico
您也可以将它们转换为“1997-10”形式的字符串。那么你可以像'平均值(浓度[日期> =开始和日期<=结束])'库(动物园)' –
; as.yearmon(参与者$ start)'等等......在这种情况下也可能非常方便,如果你不想处理稍微笨拙的POSIXct格式。 – thelatemail