2017-04-13 57 views
0

我有这样的一个表:MySql的GROUP BY使用文件排序 - 查询优化

CREATE TABLE `purchase` (
    `fact_purchase_id` binary(16) NOT NULL, 
    `purchase_id` int(10) unsigned NOT NULL, 
    `purchase_id_primary` int(10) unsigned DEFAULT NULL, 
    `person_id` int(10) unsigned NOT NULL, 
    `person_id_owner` int(10) unsigned NOT NULL, 
    `service_id` int(10) unsigned NOT NULL, 
    `fact_count` int(10) unsigned NOT NULL DEFAULT '0', 
    `fact_type` tinyint(3) unsigned NOT NULL, 
    `date_fact` date NOT NULL, 
    `purchase_name` varchar(255) DEFAULT NULL, 
    `activation_price` decimal(7,2) unsigned NOT NULL DEFAULT '0.00', 
    `activation_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00', 
    `renew_price` decimal(7,2) unsigned DEFAULT '0.00', 
    `renew_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00', 
    `activation_cost` decimal(7,2) unsigned DEFAULT '0.00', 
    `activation_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00', 
    `renew_cost` decimal(7,2) unsigned DEFAULT '0.00', 
    `renew_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00', 
    `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, 
    PRIMARY KEY (`fact_purchase_id`), 
    KEY `purchase_id_idx` (`purchase_id`), 
    KEY `person_id_idx` (`person_id`), 
    KEY `person_id_owner_idx` (`person_id_owner`), 
    KEY `service_id_idx` (`service_id`), 
    KEY `fact_type_idx` (`fact_type`), 
    KEY `renew_price_idx` (`renew_price`), 
    KEY `renew_cost_idx` (`renew_cost`), 
    KEY `renew_price_year_idx` (`renew_price_year`), 
    KEY `renew_cost_year_idx` (`renew_cost_year`), 
    KEY `date_created_idx` (`date_created`), 
    KEY `purchase_id_primary_idx` (`purchase_id_primary`), 
    KEY `fact_count` (`fact_count`), 
    KEY `renew_price_year_total_idx` (`renew_price_total`), 
    KEY `renew_cost_year_total_idx` (`renew_cost_total`), 
    KEY `date_fact` (`date_fact`) USING BTREE, 
    CONSTRAINT `purchase_person_fk` FOREIGN KEY (`person_id`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION, 
    CONSTRAINT `purchase_person_owner_fk` FOREIGN KEY (`person_id_owner`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION, 
    CONSTRAINT `purchase_service_fk` FOREIGN KEY (`service_id`) REFERENCES `service` (`service_id`) ON DELETE NO ACTION ON UPDATE NO ACTION 
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

我推出这个查询:

SELECT 
    purchase.date_fact, 
    UNIX_TIMESTAMP(purchase.date_fact), 
    COUNT(DISTINCT purchase.purchase_id) AS Num 
FROM 
    purchase 
WHERE 
    purchase.date_fact >= '2017-01-01' 
    AND purchase.date_fact <= '2017-01-31' 
    AND purchase.fact_type = 3 
    AND purchase.purchase_id_primary IS NULL 
GROUP BY purchase.date_fact 

该表一共包含了5.629.670记录,上运行查询的EXPLAIN我得到这些结果:

  • rows = 2.814.835
  • possible_keys = fact_type_idx,purchase_id_primary_idx,date_fact
  • key = fact_type_idx
  • key_len = 1
  • ref = const
  • filtered = 25.00
  • Extra = Using index condition;Using where;Using filesort

的查询接受30-35开环nd被执行。这太久了,无法等待。

问题是GROUP BY导致文件被应用。 ORDER BY NULL应用于查询不会更改任何内容

我可以使用覆盖索引,但我只需要在这个查询中的date_fact:我可以使用哪些字段?

如何避免GROUP BY上的文件夹?我如何优化查询以使其更快?

我将此表用于统计目的(OLAP)。也许有更好的DBMS用于这个目的吗?

我正在运行MySql Server 5.7.17。

谢谢

回答

2

对于此查询:

SELECT p.date_fact, UNIX_TIMESTAMP(p.date_fact), 
     COUNT(DISTINCT p.purchase_id) AS Num 
FROM purchase p 
WHERE p.date_fact >= '2017-01-01' AND 
     p.date_fact <= '2017-01-31' AND 
     p.fact_type = 3 AND 
     p.purchase_id_primary IS NULL 
GROUP BY p.date_fact; 

我会建议在(fact_type, purchase_id_primary, date_fact, purchase_id)一个复合索引。前两个键在WHERE中具有相等条件。第三个是不等式,第四个允许索引“覆盖”查询(查询中的所有列都在索引中)。

我还会补充一句:如果你不需要COUNT(DISTINCT),那就不要使用它。 purchase_idpurchase中可能已经是唯一的。