2016-06-09 27 views
0

我有一个data.frame在第一列中有15分钟的时间步长,另外有16列满了数据。我想获得每列的小时均值。我正在使用聚合,它对1分钟的数据完美的工作。集合函数每天产生的而不是每小时的平均值

mydata <- list() 
for(j in colnames(data_frame)){ 
    data_mean <- aggregate(data_frame[j], 
         list(hour=cut(as.POSIXct(data_frame$TIME), "hour")), 
         mean, na.rm=TRUE) 
    mydata[[j]] <- data_mean 
} 

当我在15分钟的数据集上使用这个相同的设置时,它给我的是每日平均值而不是小时平均值。任何想法为什么?

我的数据是这样的,1分钟数据:

"TIME","Tair","RH" 
2016-01-01 00:01:00,5.9,82 
2016-01-01 00:02:00,5.9,82 
2016-01-01 00:03:00,5.9,82 
2016-01-01 00:04:00,5.89,82 
2016-01-01 00:05:00,5.8,82 
2016-01-01 00:06:00,5.8,82 
2016-01-01 00:07:00,5.8,82 
2016-01-01 00:08:00,5.8,82 
2016-01-01 00:09:00,5.8,82 
2016-01-01 00:10:00,5.8,82 
2016-01-01 00:11:00,5.8,82 
2016-01-01 00:12:00,5.8,82 
2016-01-01 00:13:00,5.8,82 
2016-01-01 00:14:00,5.8,82 
2016-01-01 00:15:00,5.8,82 
2016-01-01 00:16:00,5.8,82 
2016-01-01 00:17:00,5.8,82 
2016-01-01 00:18:00,5.8,82 
2016-01-01 00:19:00,5.8,82 
2016-01-01 00:20:00,5.8,82 
2016-01-01 00:21:00,5.75,82 
2016-01-01 00:22:00,5.78,82 
2016-01-01 00:23:00,5.78,83 
2016-01-01 00:24:00,5.8,82 
2016-01-01 00:25:00,5.73,82 
2016-01-01 00:26:00,5.7,82 
2016-01-01 00:27:00,5.7,82 
2016-01-01 00:28:00,5.7,82 
2016-01-01 00:29:00,5.7,82 
2016-01-01 00:30:00,5.7,82 
2016-01-01 00:31:00,5.7,83 
2016-01-01 00:32:00,5.76,83 
2016-01-01 00:33:00,5.8,83 
2016-01-01 00:34:00,5.8,82 
2016-01-01 00:35:00,5.8,82 
2016-01-01 00:36:00,5.8,83 
2016-01-01 00:37:00,5.79,83 
2016-01-01 00:38:00,5.7,82 

而15分钟的数据:

"TIME","Tair","RH" 
2016-01-01 00:15:00,6.228442,80.40858 
2016-01-01 00:30:00,6.121088,81.00000 
2016-01-01 00:45:00,6.075000,NA 
2016-01-01 01:00:00,5.951910,NA 
2016-01-01 01:15:00,5.844144,NA 
2016-01-01 01:30:00,5.802242,NA 
2016-01-01 01:45:00,5.747619,NA 
2016-01-01 02:00:00,5.742889,NA 
2016-01-01 02:15:00,5.752584,81.12135 
2016-01-01 02:30:00,5.677753,81.00000 
2016-01-01 02:45:00,5.500224,81.61435 
2016-01-01 03:00:00,5.225282,82.29797 
2016-01-01 03:15:00,5.266441,83.00000 
2016-01-01 03:30:00,5.200448,83.32584 
2016-01-01 03:45:00,5.098876,84.00000 
2016-01-01 04:00:00,5.081061,83.76894 
2016-01-01 04:15:00,5.230769,82.88664 
2016-01-01 04:30:00,5.300000,82.06742 
2016-01-01 04:45:00,5.300000,NA 
2016-01-01 05:00:00,5.399776,NA 

回答

1

您的代码为我工作。

然而,你的循环中,它反复计算TIME柱为data.frame的每一行的切稍有浪费。你可以预先计算它,但有一个更好的解决方案。

可以产生相同的结果,但在一个单一的通话更简单,更传统,更实用的形式aggregate()

aggregate(df1[names(df1)!='TIME'],list(hour=cut(df1$TIME,'hour')),mean,na.rm=T); 
##   hour  Tair  RH 
## 1 2016-01-01 5.786316 82.15789 
aggregate(df15[names(df15)!='TIME'],list(hour=cut(df15$TIME,'hour')),mean,na.rm=T); 
##     hour  Tair  RH 
## 1 2016-01-01 00:00:00 6.141510 80.70429 
## 2 2016-01-01 01:00:00 5.836479  NaN 
## 3 2016-01-01 02:00:00 5.668362 81.24523 
## 4 2016-01-01 03:00:00 5.197762 83.15595 
## 5 2016-01-01 04:00:00 5.227957 82.90767 
## 6 2016-01-01 05:00:00 5.399776  NaN 

数据

df1 <- data.frame(TIME=as.POSIXct(c('2016-01-01 00:01:00','2016-01-01 00:02:00', 
'2016-01-01 00:03:00','2016-01-01 00:04:00','2016-01-01 00:05:00','2016-01-01 00:06:00', 
'2016-01-01 00:07:00','2016-01-01 00:08:00','2016-01-01 00:09:00','2016-01-01 00:10:00', 
'2016-01-01 00:11:00','2016-01-01 00:12:00','2016-01-01 00:13:00','2016-01-01 00:14:00', 
'2016-01-01 00:15:00','2016-01-01 00:16:00','2016-01-01 00:17:00','2016-01-01 00:18:00', 
'2016-01-01 00:19:00','2016-01-01 00:20:00','2016-01-01 00:21:00','2016-01-01 00:22:00', 
'2016-01-01 00:23:00','2016-01-01 00:24:00','2016-01-01 00:25:00','2016-01-01 00:26:00', 
'2016-01-01 00:27:00','2016-01-01 00:28:00','2016-01-01 00:29:00','2016-01-01 00:30:00', 
'2016-01-01 00:31:00','2016-01-01 00:32:00','2016-01-01 00:33:00','2016-01-01 00:34:00', 
'2016-01-01 00:35:00','2016-01-01 00:36:00','2016-01-01 00:37:00','2016-01-01 00:38:00')), 
Tair=c(5.9,5.9,5.9,5.89,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.75, 
5.78,5.78,5.8,5.73,5.7,5.7,5.7,5.7,5.7,5.7,5.76,5.8,5.8,5.8,5.8,5.79,5.7),RH=c(82L,82L,82L, 
82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,83L,82L,82L,82L, 
82L,82L,82L,82L,83L,83L,83L,82L,82L,83L,83L,82L)); 

df15 <- data.frame(TIME=as.POSIXct(c('2016-01-01 00:15:00','2016-01-01 00:30:00', 
'2016-01-01 00:45:00','2016-01-01 01:00:00','2016-01-01 01:15:00','2016-01-01 01:30:00', 
'2016-01-01 01:45:00','2016-01-01 02:00:00','2016-01-01 02:15:00','2016-01-01 02:30:00', 
'2016-01-01 02:45:00','2016-01-01 03:00:00','2016-01-01 03:15:00','2016-01-01 03:30:00', 
'2016-01-01 03:45:00','2016-01-01 04:00:00','2016-01-01 04:15:00','2016-01-01 04:30:00', 
'2016-01-01 04:45:00','2016-01-01 05:00:00')),Tair=c(6.228442,6.121088,6.075,5.95191, 
5.844144,5.802242,5.747619,5.742889,5.752584,5.677753,5.500224,5.225282,5.266441,5.200448, 
5.098876,5.081061,5.230769,5.3,5.3,5.399776),RH=c(80.40858,81,NA,NA,NA,NA,NA,NA,81.12135,81, 
81.61435,82.29797,83,83.32584,84,83.76894,82.88664,82.06742,NA,NA)); 
+0

好吧,我发现这个问题。出于某种原因,聚合内部的as.Posixct()引起了这个问题。如果我在调用聚集之前创建Posixct,它就会起作用。 Thx为更短,更快的版本! – BallerNacken