2016-03-21 55 views
4

我有一个表类似如下结构:查询其适用于某些情况下只有

City  start_date    end_date 
Paris  1995-01-01 00:00:00 1997-10-01 23:59:59 
Paris  1997-10-02 00:00:00 0001-01-01 00:00:00 
Paris  2013-01-25 00:00:00 0001-01-01 00:00:00 
Paris  2015-04-25 00:00:00 0001-01-01 00:00:00 
Berlin  2014-11-01 00:00:00 0001-01-01 00:00:00 
Berlin  2014-06-01 00:00:00 0001-01-01 00:00:00 
Berlin  2015-09-11 00:00:00 0001-01-01 00:00:00 
Berlin  2015-10-01 00:00:00 0001-01-01 00:00:00 
Milan  2001-01-01 00:00:00 0001-01-01 00:00:00 
Milan  2005-10-02 00:00:00 2006-10-02 23:59:59 
Milan  2006-10-03 00:00:00 2015-04-24 23:59:59 
Milan  2015-04-25 00:00:00 0001-01-01 00:00:00 

的数据包含基于城市开始和结束日期的历史观。城市的最新记录应该是开始日期最高的记录,并且结束日期为“0001-01-01 00:00:00”,表示还没有结束日期。

我需要清理这些数据,并确保每个城市的所有历史记录都结束日期的下一个记录的开始日期前一秒,只在END_DATE设置为“0001-01-0100箱子:00:00' 。所以在end_date有实际日期的情况下,它将被忽略。另外,具有最近的城市start_date的记录不需要修改end_date。

结果表应该是这样的:

City  start_date    end_date 
Paris  1995-01-01 00:00:00 1997-10-01 23:59:59 
Paris  1997-10-02 00:00:00 2013-01-24 23:59:59 
Paris  2013-01-25 00:00:00 2015-04-24 23:59:59 
Paris  2015-04-25 00:00:00 0001-01-01 00:00:00 
Berlin  2014-11-01 00:00:00 2014-05-31 23:59:59 
Berlin  2014-06-01 00:00:00 2015-09-10 23:59:59 
Berlin  2015-09-11 00:00:00 2015-09-30 23:59:59 
Berlin  2015-10-01 00:00:00 0001-01-01 23:59:59 
Milan  2001-01-01 00:00:00 2005-10-01 23:59:59 
Milan  2005-10-02 00:00:00 2006-10-02 23:59:59 
Milan  2006-10-03 00:00:00 2015-04-24 23:59:59 
Milan  2015-04-25 00:00:00 0001-01-01 00:00:00 

我试图在this question由用户提出了以下脚本。

update test join 
     (select t.*, 
       (select min(start_date) 
       from test t2 
       where t2.city = t.city and 
         t2.start_date > t.start_date 
       order by t2.start_date 
       limit 1 
       ) as next_start_date 
     from test t 
     ) tt 
     on tt.city = test.city and tt.start_date = test.start_date 
    set test.end_date = date_sub(tt.next_start_date, interval 1 second) 
where test.end_date = '0001-01-01' and 
     next_start_date is not null; 

不幸的是,从柏林记录开始,一些end_dates并非如预期的那样(例如id号5和6)。但其他人正在出现,因为他们应该。这是如下图所示:

enter image description here

下面是创建和插入语句能够复制:

CREATE TABLE `test` (
    `id` int(11) NOT NULL AUTO_INCREMENT, 
    `city` varchar(50) DEFAULT NULL, 
    `start_date` datetime DEFAULT NULL, 
    `end_date` datetime DEFAULT NULL, 
    PRIMARY KEY (`id`) 
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8; 

INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1995-01-01 00:00:00','1997-10-01 23:59:59'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1997-10-02 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2013-01-25 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2015-04-25 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-11-01 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-06-01 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-09-11 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-10-01 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2001-01-01 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2005-10-02 00:00:00','2006-10-02 23:59:59'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2006-10-03 00:00:00','2015-04-24 23:59:59'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2015-04-25 00:00:00','0001-01-01 00:00:00'); 
+0

你'UPDATE'声明的工作确定与您提供的样本数据。请检查[this](http://sqlfiddle.com/#!9/d879f/2)演示。 –

回答

-1
-- query wanted 
UPDATE test t1 INNER JOIN 
    (SELECT *, @id := @id + 1 AS new_id 
    FROM test CROSS JOIN (SELECT @id := 0) param 
    ORDER BY city, start_date) t2 
    ON t1.city = t2.city AND t1.start_date = t2.start_date 
    INNER JOIN 
    (SELECT *, @id2 := @id2 + 1 AS new_id 
    FROM test CROSS JOIN (SELECT @id2 := 0) param 
    ORDER BY city, start_date) t3 
    ON t2.new_id + 1 = t3.new_id AND t2.city = t3.city 
SET t1.end_date = DATE_SUB(t3.start_date, INTERVAL 1 SECOND) 
WHERE t1.end_date = '0001-01-01 00:00:00'; 

下面是一个完整的演示。

SQL:

-- data 
CREATE TABLE `test` (
    `id` int(11) NOT NULL AUTO_INCREMENT, 
    `city` varchar(50) DEFAULT NULL, 
    `start_date` datetime DEFAULT NULL, 
    `end_date` datetime DEFAULT NULL, 
    PRIMARY KEY (`id`) 
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8; 

INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1995-01-01 00:00:00','1997-10-01 23:59:59'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1997-10-02 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2013-01-25 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2015-04-25 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-11-01 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-06-01 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-09-11 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-10-01 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2001-01-01 00:00:00','0001-01-01 00:00:00'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2005-10-02 00:00:00','2006-10-02 23:59:59'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2006-10-03 00:00:00','2015-04-24 23:59:59'); 
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2015-04-25 00:00:00','0001-01-01 00:00:00'); 
select * from test; 

-- query wanted 
UPDATE test t1 INNER JOIN 
    (SELECT *, @id := @id + 1 AS new_id 
    FROM test CROSS JOIN (SELECT @id := 0) param 
    ORDER BY city, start_date) t2 
    ON t1.city = t2.city AND t1.start_date = t2.start_date 
    INNER JOIN 
    (SELECT *, @id2 := @id2 + 1 AS new_id 
    FROM test CROSS JOIN (SELECT @id2 := 0) param 
    ORDER BY city, start_date) t3 
    ON t2.new_id + 1 = t3.new_id AND t2.city = t3.city 
SET t1.end_date = DATE_SUB(t3.start_date, INTERVAL 1 SECOND) 
WHERE t1.end_date = '0001-01-01 00:00:00'; 

select * from test; 

输出:

mysql> -- query wanted 
mysql> UPDATE test t1 INNER JOIN 
    -> (SELECT *, @id := @id + 1 AS new_id 
    -> FROM test CROSS JOIN (SELECT @id := 0) param 
    -> ORDER BY city, start_date) t2 
    -> ON t1.city = t2.city AND t1.start_date = t2.start_date 
    -> INNER JOIN 
    -> (SELECT *, @id2 := @id2 + 1 AS new_id 
    -> FROM test CROSS JOIN (SELECT @id2 := 0) param 
    -> ORDER BY city, start_date) t3 
    -> ON t2.new_id + 1 = t3.new_id AND t2.city = t3.city 
    -> SET t1.end_date = DATE_SUB(t3.start_date, INTERVAL 1 SECOND) 
    -> WHERE t1.end_date = '0001-01-01 00:00:00'; 
rom tesQuery OK, 6 rows affected (0.00 sec) 
Rows matched: 6 Changed: 6 Warnings: 0 

mysql> select * from test; 
+----+--------+---------------------+---------------------+ 
| id | city | start_date   | end_date   | 
+----+--------+---------------------+---------------------+ 
| 13 | Paris | 1995-01-01 00:00:00 | 1997-10-01 23:59:59 | 
| 14 | Paris | 1997-10-02 00:00:00 | 2013-01-24 23:59:59 | 
| 15 | Paris | 2013-01-25 00:00:00 | 2015-04-24 23:59:59 | 
| 16 | Paris | 2015-04-25 00:00:00 | 0001-01-01 00:00:00 | 
| 17 | Berlin | 2014-11-01 00:00:00 | 2014-05-31 23:59:59 | 
| 18 | Berlin | 2014-06-01 00:00:00 | 2015-09-10 23:59:59 | 
| 19 | Berlin | 2015-09-11 00:00:00 | 2015-09-30 23:59:59 | 
| 20 | Berlin | 2015-10-01 00:00:00 | 0001-01-01 00:00:00 | 
| 21 | Milan | 2001-01-01 00:00:00 | 2005-10-01 23:59:59 | 
| 22 | Milan | 2005-10-02 00:00:00 | 2006-10-02 23:59:59 | 
| 23 | Milan | 2006-10-03 00:00:00 | 2015-04-24 23:59:59 | 
| 24 | Milan | 2015-04-25 00:00:00 | 0001-01-01 00:00:00 | 
+----+--------+---------------------+---------------------+ 
12 rows in set (0.00 sec) 
+0

此查询是否假定每个城市的id值都是连续的?正如我注意到'id + 1'加入条件。如果是这样,不幸的是,它不适用于所有情况,因为每个城市的记录可能不会完全相互插入。 –

+0

是否需要为每条记录手动填写参数? –

+0

所以@id值不需要填写? –

相关问题