2012-11-01 128 views
1

我有一个事件记录,当一个设备开始或停止失败代码,我试图计算失败和开始之间的平均和平均时间。下面是数据的一个很简单的例子表:MySQL group with with a lookahead?

+----+-----------+---------------------+ 
| id | eventName | eventTime   | 
+----+-----------+---------------------+ 
| 1 | start  | 2012-11-01 14:25:20 | 
| 2 | fail A | 2012-11-01 14:27:45 | 
| 3 | start  | 2012-11-01 14:30:49 | 
| 4 | fail B | 2012-11-01 14:32:54 | 
| 5 | start  | 2012-11-01 14:35:59 | 
| 6 | fail A | 2012-11-01 14:37:02 | 
| 7 | start  | 2012-11-01 14:38:05 | 
| 8 | fail A | 2012-11-01 14:40:09 | 
| 9 | start  | 2012-11-01 14:41:11 | 
| 10 | fail C | 2012-11-01 14:43:14 | 
+----+-----------+---------------------+ 

创建代码:

CREATE TABLE `test` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT, 
    `eventName` varchar(50) NOT NULL, 
    `eventTime` datetime NOT NULL, 
    PRIMARY KEY (`id`) 
); 
INSERT INTO `test` (`id`, `eventName`, `eventTime`) VALUES (1,'start','2012-11-01 14:25:20'),(2,'fail A','2012-11-01 14:27:45'),(3,'start','2012-11-01 14:30:49'),(4,'fail B','2012-11-01 14:32:54'),(5,'start','2012-11-01 14:35:59'),(6,'fail A','2012-11-01 14:37:02'),(7,'start','2012-11-01 14:38:05'),(8,'fail A','2012-11-01 14:40:09'),(9,'start','2012-11-01 14:41:11'),(10,'fail C','2012-11-01 14:43:14'); 

我可以得到启动和使用这样的一个失败的次数:

SET @time_prev := -1; 
SELECT 
* 
FROM 
(
    SELECT 
    eventName 
    , eventTime 
    , @ts := UNIX_TIMESTAMP(eventTime) AS ts 
    , @started := IF(eventName = 'start', 1, 0) AS started 
    , @failed := IF(eventName <> 'start', 1, 0) AS failed 
    , @time_diff := IF(@time_prev > -1, @ts - @time_prev, 0) AS time_diff 
    , @time_prev := @ts AS time_prev 
    , @time_to_fail := IF(@failed, @time_diff, 0) AS time_to_fail 
    , @time_to_start := IF(@started, @time_diff, 0) AS time_to_start 
    FROM 
    test 
) AS t1; 

+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+ 
| eventName | eventTime   | ts   | started | failed | time_diff | time_prev | time_to_fail | time_to_start | 
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+ 
| start  | 2012-11-01 14:25:20 | 1351805120 |  1 |  0 |   0 | 1351805120 | 0   | 0    | 
| fail A | 2012-11-01 14:27:45 | 1351805265 |  0 |  1 |  145 | 1351805265 | 0   | 145   | 
| start  | 2012-11-01 14:30:49 | 1351805449 |  1 |  0 |  184 | 1351805449 | 184   | 0    | 
| fail B | 2012-11-01 14:32:54 | 1351805574 |  0 |  1 |  125 | 1351805574 | 0   | 125   | 
| start  | 2012-11-01 14:35:59 | 1351805759 |  1 |  0 |  185 | 1351805759 | 185   | 0    | 
| fail A | 2012-11-01 14:37:02 | 1351805822 |  0 |  1 |  63 | 1351805822 | 0   | 63   | 
| start  | 2012-11-01 14:38:05 | 1351805885 |  1 |  0 |  63 | 1351805885 | 63   | 0    | 
| fail A | 2012-11-01 14:40:09 | 1351806009 |  0 |  1 |  124 | 1351806009 | 0   | 124   | 
| start  | 2012-11-01 14:41:11 | 1351806071 |  1 |  0 |  62 | 1351806071 | 62   | 0    | 
| fail C | 2012-11-01 14:43:14 | 1351806194 |  0 |  1 |  123 | 1351806194 | 0   | 123   | 
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+ 

但为了在失败和开始之间获得时间,我必须前进到下一个记录并丢失该失败代码的分组。我怎样才能将其移动到下一个级别,并让未来的时间开始合并到失败的记录中,以便将其分组?

最终,计算平均值和中位数后,我最终会设置这样的结果:

+-----------+-------------+----------------+--------------+-----------------+ 
| eventName | avg_to_fail | median_to_fail | avg_to_start | median_to_start | 
+-----------+-------------+----------------+--------------+-----------------+ 
| fail A |  110.66 |   124.00 |  103.00 |   63.00 | 
| fail B |  125.00 |   125.00 |  185.00 |   185.00 | 
+-----------+-------------+----------------+--------------+-----------------+ 

回答

1

这使平均 中位数是SQL中的痛苦。 Simple way to calculate median with MySQL给出了一些想法。这两个内部查询给出了结果集的中位数以上是否存在中值聚合。

Select 
    times.eventName, 
    avg(times.timelapse) as avg_to_fail, 
    avg(times2.timelapse) as avg_to_start 
From (
    Select 
    starts.id, 
    starts.eventName, 
    TimestampDiff(SECOND, starts.eventTime, Min(ends.eventTime)) as timelapse 
    From 
    Test as starts, 
    Test as ends 
    Where 
    starts.eventName != 'start' And 
    ends.eventName = 'start' And 
    ends.eventTime > starts.eventTime 
    Group By 
    starts.id 
) as times2 
    Right Outer Join (
    Select 
    starts.id, 
    ends.eventName, 
    TimestampDiff(SECOND, starts.eventTime, Min(ends.eventTime)) as timelapse 
    From 
    Test as starts, 
    Test as ends 
    Where 
    starts.eventName = 'start' And 
    ends.eventName != 'start' And 
    ends.eventTime > starts.eventTime 
    Group By 
    starts.id 
) as times 
    On times2.EventName = times.EventName 
Group By 
    Times.eventName 

为了帮助理解我会首先考虑

Select 
    starts.id, 
    ends.eventName, 
    starts.eventTime, 
    ends.eventTime 
From 
    Test as starts, 
    Test as ends 
Where 
    starts.eventName = 'start' And 
    ends.eventName != 'start' And 
    ends.eventTime > starts.eventTime 

这是内部查询times而不受组和分钟发言的精髓。你会看到这有一行将每个开始事件与结束事件在开始事件之后的每个结束事件组合在一起。调用此X.

接下来的部分是

Select 
    X.startid, 
    X.endeventname, 
    TimestampDiff(SECOND, X.starttime, Min(x.endTime)) as timelapse 
From 
    X 
Group By 
    X.startid 

这里的关键是民(x.endTime)同组的结合。所以我们得到了开始时间之后的最早结束时间(因为X已经限制它在之后)。虽然我只挑选了需要使用的列,但我们可以在这里访问开始时间标识,结束时间标识开始事件,结束事件,开始时间,分钟(结束时间)。你可以用它来找到avg_to_start的原因是因为我们选择了有趣的事件名称,因为我们都有。

SQL小提琴:http://sqlfiddle.com/#!2/90465/6

+0

我删除位数从标题,这不是问题。问题是根据下一行数据计算第二个avg/median。 – fwrawx

+0

@fwrawx - 我已经根据您的规范更新它,以提供avg_to_fail。适应avg_to_start很容易。然后,您可以完全外连接EventName上的两个结果集。 – Laurence

+0

* _to_fail是很容易的部分,获取* _to_start和合并是困难的部分,因为1)eventName对于所有记录是相同的,并且2)时间是从前一记录计算的 – fwrawx