2012-10-27 108 views
4

我想查询一个分区表(按月)接近20M行。我需要根据DATE(transaction_utc)以及country_id进行分组。如果我关闭组和聚合,返回的行刚刚超过40k,但这并不是太多,但通过添加组通过使查询明显变慢,除非所述GROUP BY位于transaction_utc列上,在这种情况下,快速。为什么我的MySQL组太慢了?

我一直在试图通过调整查询和/或索引来优化下面的第一个查询,并且达到了下面的点(大约是最初的2倍),但仍然停留在用于总结45k行的5s查询中,这似乎太多了。

作为参考,这个盒子是一个全新的24个逻辑核心,64GB的RAM,比服务器上的索引空间的方式提供更多的InnoDB缓冲池MariaDB的-5.5.X服务器,所以不应该有任何内存或CPU的压力。

所以,我正在寻找什么导致这种减速的想法和加速它的建议。任何反馈将不胜感激! :)

好了,到细节...

下面的查询(一个我真正需要的)需要,5秒(+/-),并返回少于100行。

SELECT lss.`country_id` AS CountryId 
, Date(lss.`transaction_utc`) AS TransactionDate 
, c.`name` AS CountryName, lss.`country_id` AS CountryId 
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD 
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD 
FROM `sales` lss 
JOIN `countries` c ON lss.`country_id` = c.`country_id` 
WHERE (lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser') GROUP BY lss.`country_id`, DATE(lss.`transaction_utc`) 

EXPLAIN SELECT对于相同的查询如下。注意它没有使用transaction_utc键。它不应该使用我的覆盖指数吗?

id select_type table type possible_keys key key_len ref rows Extra 
1 SIMPLE lss ref idx_unique,transaction_utc,country_id idx_unique 50 const 1208802 Using where; Using temporary; Using filesort 
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 georiot.lss.country_id 1 

到现在,我一直试图尝试确定怎么回事其他几个选项...

下面的查询(改变GROUP BY),大约需要5秒(+/-)并且仅返回3行:

SELECT lss.`country_id` AS CountryId 
, DATE(lss.`transaction_utc`) AS TransactionDate 
, c.`name` AS CountryName, lss.`country_id` AS CountryId 
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD 
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD 
FROM `sales` lss 
JOIN `countries` c ON lss.`country_id` = c.`country_id` 
WHERE (lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser') GROUP BY lss.`country_id` 

以下查询(除去组由)需要4-5秒(+/-),并返回1行:

SELECT lss.`country_id` AS CountryId 
    , DATE(lss.`transaction_utc`) AS TransactionDate 
    , c.`name` AS CountryName, lss.`country_id` AS CountryId 
    , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD 
    , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD 
    FROM `sales` lss 
    JOIN `countries` c ON lss.`country_id` = c.`country_id` 
    WHERE (lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser') 

以下查询需要0.00X秒(+/-)并返回〜45k行。这对我表明,在最高我们只是想组45K行到小于100组(在我的初始查询):

SELECT lss.`country_id` AS CountryId 
    , DATE(lss.`transaction_utc`) AS TransactionDate 
    , c.`name` AS CountryName, lss.`country_id` AS CountryId 
    , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD 
    , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD 
    FROM `sales` lss 
    JOIN `countries` c ON lss.`country_id` = c.`country_id` 
    WHERE (lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser') 
GROUP BY lss.`transaction_utc` 

表模式:

CREATE TABLE IF NOT EXISTS `sales` (
    `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, 
    `user_linkshare_account_id` int(11) unsigned NOT NULL, 
    `username` varchar(16) NOT NULL, 
    `country_id` int(4) unsigned NOT NULL, 
    `order` varchar(16) NOT NULL, 
    `raw_tracking_code` varchar(255) DEFAULT NULL, 
    `transaction_utc` datetime NOT NULL, 
    `processed_utc` datetime NOT NULL , 
    `sku` varchar(16) NOT NULL, 
    `sale_original` decimal(10,4) NOT NULL, 
    `sale_usd` decimal(10,4) NOT NULL, 
    `quantity` int(11) NOT NULL, 
    `commission_original` decimal(10,4) NOT NULL, 
    `commission_usd` decimal(10,4) NOT NULL, 
    `original_currency` char(3) NOT NULL, 
    PRIMARY KEY (`id`,`transaction_utc`), 
    UNIQUE KEY `idx_unique` (`username`,`order`,`processed_utc`,`sku`,`transaction_utc`), 
    KEY `raw_tracking_code` (`raw_tracking_code`), 
    KEY `idx_usd_amounts` (`sale_usd`,`commission_usd`), 
    KEY `idx_countries` (`country_id`), 
    KEY `transaction_utc` (`transaction_utc`,`username`,`country_id`,`sale_usd`,`commission_usd`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
/*!50100 PARTITION BY RANGE (TO_DAYS(`transaction_utc`)) 
(PARTITION pOLD VALUES LESS THAN (735112) ENGINE = InnoDB, 
PARTITION p201209 VALUES LESS THAN (735142) ENGINE = InnoDB, 
PARTITION p201210 VALUES LESS THAN (735173) ENGINE = InnoDB, 
PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB, 
PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB, 
PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ AUTO_INCREMENT=19696320 ; 
+0

你检查'EXPLAIN PARTITIONS '以确保只有合法的分区进行扫描? – lowleveldesign

回答

9

有问题的部分是可能是GROUP BY DATE(transaction_utc)。你还声称有这个查询的覆盖索引,但我看不到。您的5列索引具有查询中使用的所有列,但不是最佳顺序(即:WHERE-GROUP BY-SELECT)。

因此,找不到有用的索引的引擎将不得不为20M所有行评估这个函数。实际上,它找到一个以usernameidx_unique)开头的索引,并且它使用该索引,因此它必须评估(仅)1.2M行的函数。如果你有(transaction_utc)(username, transaction_utc)它会选择三者中最有用的。

你可以通过将列拆分成日期和时间部分来改变表结构吗? 如果可以,则(username, country_id, transaction_date)或(更改用于分组的两列的顺序)上的索引(username, transaction_date, country_id)将非常有效。

覆盖指数(username, country_id, transaction_date, sale_usd, commission_usd)甚至更​​好。


如果你想保持目前的结构,尝试改变你的5列索引中的顺序:

(username, country_id, transaction_utc, sale_usd, commission_usd) 

或邮寄至:

(username, transaction_utc, country_id, sale_usd, commission_usd) 

既然你正在使用MariaDB,您可以使用VIRTUAL columns功能,而无需更改前isting列:

添加虚拟(永久)列和相应的索引:

ALTER TABLE sales 
    ADD COLUMN transaction_date DATE NOT NULL 
       AS DATE(transaction_utc) 
       PERSISTENT 
    ADD INDEX special_IDX 
     (username, country_id, transaction_date, sale_usd, commission_usd) ; 
+0

是的,我想过把日期/时间分开,但还是认为还有一些其他问题。我确实在不同顺序的相同字段上有索引,你说我应该修改索引中字段的顺序? – JesseP

+0

是的,索引中列的顺序很重要。唯一的其他索引可能是好的(或者更好的)是,如果转置两列:'(username,transaction_utc,country_id)' –

+0

不会改变查询的顺序吗? (我看到这在perf中没有明显的差异)。 WHERE(lss.'username' ='someuser'AND lss.'transaction_utc' BETWEEN'2012-09-26'和'2012-10-26')或者按照原样离开查询并更改索引?我想我不清楚查询的每个部分的执行顺序,因为我认为它需要是(“条款项目”,“按项目分组”,“加入项目”,“退货项目”)。 – JesseP