2015-03-02 44 views
4

时区转换和组MySQL查询这是我的表在MySQL 5.5含30万条记录优化具有小时

CREATE TABLE `campaign_logs` (
    `domain` varchar(50) DEFAULT NULL, 
    `campaign_id` varchar(50) DEFAULT NULL, 
    `subscriber_id` varchar(50) DEFAULT NULL, 
    `message` varchar(21000) DEFAULT NULL, 
    `log_time` datetime DEFAULT NULL, 
    `log_type` varchar(50) DEFAULT NULL, 
    `level` varchar(50) DEFAULT NULL, 
    `campaign_name` varchar(500) DEFAULT NULL, 
    KEY `subscriber_id_index` (`subscriber_id`), 
    KEY `log_type_index` (`log_type`), 
    KEY `log_time_index` (`log_time`), 
    KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`), 
    KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8 | 

在下面的查询,我在做GROUP BY小时相对于时区

QUERY

SELECT 
    log_type 
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date 
    ,count(*) AS total 
    ,count(DISTINCT subscriber_id) d 
FROM 
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE 
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_OPENED' 
    AND log_time BETWEEN 
     CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND 
     CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30') 
GROUP BY log_date 

UNION ALL 

SELECT 
    log_type 
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date 
    ,count(*) AS total 
    ,count(DISTINCT subscriber_id) d 
FROM 
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE 
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_SENT' 
    AND log_time BETWEEN 
     CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND 
     CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30') 
GROUP BY log_date 

UNION ALL 

SELECT 
    log_type 
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date 
    ,count(*) AS total 
    ,count(DISTINCT subscriber_id) d 
FROM 
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE 
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_CLICKED' 
    AND log_time BETWEEN 
     CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND 
     CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30') 
GROUP BY log_date; 

成绩

上面的查询将会给这样

+---------------+-------+----------------+-------------+ 
| EMAIL_CLICKED | 1 AM |    71 |   83 | 
| EMAIL_CLICKED | 1 PM |    25 |   27 | 
| EMAIL_SENT | 10 AM |    51 |   59 | 
| EMAIL_OPENED | 10 PM |    16 |   18 | 

这是上面的查询

的解释结果EXPLAIN

+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+ 
| id | select_type | table   | type | possible_keys        | key          | key_len | ref | rows | Extra         | 
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+ 
| 1 | PRIMARY  | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468  | NULL | 55074 | Using where; Using index; Using filesort | 
| 2 | UNION  | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468  | NULL | 330578 | Using where; Using index; Using filesort | 
| 3 | UNION  | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468  | NULL | 1589 | Using where; Using index; Using filesort | 
|NULL| UNION RESULT | <union1,2,3> | ALL | NULL          | NULL          | NULL | NULL | NULL |           | 
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+ 

优化?

我们在此表上有一个覆盖索引。

此查询花费很长时间(超过1分钟)。

如果我从查询中删除distinct_count(subscriber_id),那么我们在1.5秒内得到结果,但是我需要查询中的subscriber_iddistinct_count

有没有什么办法可以优化这个查询?

感谢

+0

尝试使用'GROUP BY log_type,log_time' – LeGEC 2015-03-02 11:38:30

+0

@LeGEC感谢您的评论,我需要按小时分组,如果我按log_time分组,则不会给出所需的输出。 – Rams 2015-03-02 11:43:32

+0

如果将查询限制为一种日志类型,这会如何影响性能?如果删除不同的计数,这会如何影响性能? – 2015-03-02 11:47:30

回答

3

你不处理数据量巨大,所以group by不宜服用40秒 - 假设你是不是有很多的锁活动的桌子上一个非常繁忙的服务器上。

试试这个版本的查询(限一log_type)的:

SELECT log_type, 
     DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS time, 
     count(DISTINCT subscriber_id) AS distinct_count, 
     count(subscriber_id) AS total_count 
FROM stats.campaign_logs 
WHERE DOMAIN = 'xxxx' AND 
     campaign_id='1234' AND 
     log_type = 'EMAIL_SENT' AND 
     log_time BETWEEN CONVERT_TZ('2015-02-07 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-02-14 23:59:58','+00:00','+05:30') 
GROUP BY time; 

这应该优化使用索引。如果速度很快,那么请使用union all将这些行放在一起。丑陋,但有时由于索引优化,有时union allOR/IN快得多。

+0

Hi @Gordon Linoff,感谢您的回复,我更新了我的查询,并更新了我的问题,请检查一次 – Rams 2015-03-17 02:35:22

+0

嗨@Gordon,我根据您的建议更新了我的查询,但仍然查询需要很长时间才能获得结果。我从表中删除了campaign_id和domain_id索引,因为我有以domain_id和campaign_id开头的复合索引 – Rams 2015-03-17 03:30:01

-1
SELECT 
    log_type 
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date 
    ,count(*) AS total 
    ,count(DISTINCT subscriber_id) d 
FROM 
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE 
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type IN ('EMAIL_OPENED','EMAIL_SENT','EMAIL_CLICKED') 
    AND log_time BETWEEN 
     CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND 
     CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30') 
GROUP BY log_date, log_type 

如果我理解正确,可以解决您的问题吗?