2016-11-21 42 views
0

我有一项任务可以对时间序列数据进行异常检测。我有异常检测的代码,但我正在为它准备数据。数据如下所示。为异常检测准备数据

timestampUtc 

2016-08-01 14:38:01, 2016-08-01 14:38:06, 2016-08-01 14:38:12, 2016-08-01 14:38:18, 2016-08-01 14:38:22, 2016-08-01 14:38:27, 2016-08-01 14:38:27, 2016-08-01 14:38:30, 2016-08-01 14:38:37, 2016-08-01 14:38:38, 2016-08-01 14:38:38, 2016-08-01 14:38:46, 2016-08-01 14:39:03, 2016-08-01 14:39:03, 2016-08-01 14:39:10, 2016-08-01 14:39:12, 2016-08-01 14:39:15, 2016-08-01 14:39:16, 2016-08-01 14:39:20, 2016-08-01 14:39:28 

首先我想在timestampUtc列中做秒零。接下来,我想创建一个表示count的列,并希望计算该特定分钟的值的数量。例如,输出应该是这样的:

timestampUtc count 

2016-08-01 14:38:00 12, 2016-08-01 14:39:00 6, 2016-08-01 14:40:00 8 

回答

1

您可以使用as.POSIXct()转换字符串为日期,使用一些格式忽略秒,然后用table来概括:

timestampUtc <- c('2016-08-01 14:38:01', '2016-08-01 14:38:06', '2016-08-01 14:38:12', '2016-08-01 14:38:18', '2016-08-01 14:38:22', '2016-08-01 14:38:27', '2016-08-01 14:38:27', '2016-08-01 14:38:30', '2016-08-01 14:38:37', '2016-08-01 14:38:38', '2016-08-01 14:38:38', '2016-08-01 14:38:46', '2016-08-01 14:39:03', '2016-08-01 14:39:03', '2016-08-01 14:39:10', '2016-08-01 14:39:12', '2016-08-01 14:39:15', '2016-08-01 14:39:16', '2016-08-01 14:39:20', '2016-08-01 14:39:28') 
timestampUtc <- as.POSIXct(timestampUtc, format="%Y-%m-%d %H:%M", tz="UTC") 
table(timestampUtc) 
2016-08-01 14:38:00 2016-08-01 14:39:00 
       12     8 
1

假设你时间戳已经在POSIXt格式,您的时间戳的数据存储在DF-

df$count <- 1 
df$timestamp <- format(df$timestamp, format = "%Y-%m-%d %H:%M") 
df <- aggregate(count ~ timestamp, data = df, FUN = sum) 
names(df) <- c("timestamp", "count") 
+0

我想你可以避免总结一堆,如果你只是使用:'df < - 聚合(计数〜时间戳,数据= df,乐趣=长度)' –

1

无论是cutseq种方法POSIXt类有breaks(或by)一个间隔选项:

timestampUtc <-scan(text="2016-08-01 14:38:01, 2016-08-01 14:38:06, 2016-08-01 14:38:12, 2016-08-01 14:38:18, 2016-08-01 14:38:22, 2016-08-01 14:38:27, 2016-08-01 14:38:27, 2016-08-01 14:38:30, 2016-08-01 14:38:37, 2016-08-01 14:38:38, 2016-08-01 14:38:38, 2016-08-01 14:38:46, 2016-08-01 14:39:03, 2016-08-01 14:39:03, 2016-08-01 14:39:10, 2016-08-01 14:39:12, 2016-08-01 14:39:15, 2016-08-01 14:39:16, 2016-08-01 14:39:20, 2016-08-01 14:39:28", 
         what="", sep=",") 
#Read 20 items 

table(cut(as.POSIXct(timestampUtc), breaks="min") ) 
#------------ 
2016-08-01 14:38:00 2016-08-01 14:39:00 
       12     8 

假如你想10周或15分钟的间隔,它可能是“10分钟”或“15分钟”。迄今为止的其他答案之一是在输入阶段删除信息,我认为这是一个值得怀疑的做法,但是code_is_entropy在传递到table的阶段使用了format,其格式字符串较短。