2012-08-24 77 views
-1

我一直在摔跤与以下查询(和其他一些类似的),我觉得我失去了一些东西,或者我使用的是错误类型的数据库或其他东西。如何优化这个怪物查询

该查询用于获取过去10年中新电影的总数以及每年在英国与特定城镇停止播放(关闭)的电影总数。多年来,这些查询也为许多乡镇运行。

其他查询会做类似的事情,有时候会在最后添加一个UNION ALL到一个查询来获取打开或关闭的记录年份。

也有查询的月度数据和季度数据,而不是年度数据来看,和其中一些只是比较历史打开/特定的四分之一(例如Q3)或月(如3月),关闭。

这里是一个将在2012年比较英国伦敦查询:

SELECT inc.opening_year as year, inc.number_of_films as opens, 
    diss.number_of_films as closures, inc.uk_films as uk_opens, 
    diss.uk_films as uk_closures 
FROM 
(SELECT film_db.opening_year, uk.number_of_films as uk_films, 
     COUNT(film_db.id_film_db) as number_of_films 
    FROM film_db 
    JOIN postcodes ON id_postcodes = opening_postcode_id 
    JOIN towns ON id_towns = town_id AND town = 'London' 
    JOIN (SELECT opening_year, COUNT(film_db.id_film_db) as number_of_films 
      FROM film_db 
      WHERE opening_year <= 2012 AND opening_year >= (2012 - 10) 
      GROUP BY opening_year 
     ) uk ON uk.opening_year = film_db.opening_year 
    WHERE film_db.opening_year <= 2012 AND film_db.opening_year >= (2012 - 10) 
    GROUP BY film_db.opening_year 
    ORDER BY film_db.opening_year DESC 
) inc 
JOIN 
(SELECT film_db.closing_year, uk.number_of_films as uk_films, 
     COUNT(film_db.id_film_db) as number_of_films 
    FROM film_db 
    JOIN postcodes ON id_postcodes = postcode_id 
    JOIN towns ON id_towns = town_id AND town = 'London' 
    JOIN (SELECT closing_year, COUNT(film_db.id_film_db) as number_of_films 
      FROM film_db 
      WHERE film_db.closing_year <= 2012 AND film_db.closing_year >= (2012 - 10) 
      GROUP BY film_db.closing_year 
     ) uk ON uk.closing_year = film_db.closing_year 
    WHERE film_db.closing_year <= 2012 AND film_db.closing_year >= (2012 - 10) 
    GROUP BY film_db.closing_year 
    ORDER BY film_db.closing_year DESC 
) diss ON diss.closing_year = inc.opening_year 

的DB SHOW CREATE TABLE输出如下:

film_db:

CREATE TABLE `film_db` (
    `id_film_db` int(11) NOT NULL AUTO_INCREMENT, 
    `film_name` varchar(255) DEFAULT NULL, 
    `category` varchar(100) DEFAULT NULL, 
    `status` varchar(50) DEFAULT NULL, 
    `opening_date` date DEFAULT NULL, 
    `opening_year` int(4) DEFAULT NULL, 
    `opening_month` int(2) DEFAULT NULL, 
    `opening_quarter` int(1) DEFAULT NULL, 
    `closing_date` date DEFAULT NULL, 
    `closing_year` int(4) DEFAULT NULL, 
    `closing_month` int(2) DEFAULT NULL, 
    `closing_quarter` int(1) DEFAULT NULL, 
    `datetime` timestamp NULL DEFAULT CURRENT_TIMESTAMP, 
    `postcode_id` int(4) NOT NULL DEFAULT '0', 
    `opening_postcode_id` int(4) NOT NULL DEFAULT '0', 
    PRIMARY KEY (`id_film_db`), 
    KEY `opening_date` (`opening_date`), 
    KEY `status` (`status`), 
    KEY `postcode_id` (`postcode_id`), 
    KEY `type` (`category`), 
    KEY `opening_year` (`opening_year`), 
    KEY `opening_month` (`opening_month`,`opening_year`) USING BTREE, 
    KEY `opening_quarter` (`opening_quarter`,`opening_year`) USING BTREE, 
    KEY `closing_year` (`closing_year`), 
    KEY `closing_month` (`closing_year`,`closing_month`), 
    KEY `closing_quarter` (`closing_year`,`closing_quarter`), 
    KEY `closing_date` (`closing_date`), 
    KEY `opening_closing_date` (`opening_date`,`closing_date`), 
    KEY `opening_postcode` (`opening_postcode_id`), 
    FULLTEXT KEY `film_name` (`film_name`) 
) ENGINE=MyISAM AUTO_INCREMENT=10649173 DEFAULT CHARSET=utf8 

邮政编码:

CREATE TABLE `postcodes` (
    `id_postcodes` int(4) NOT NULL AUTO_INCREMENT, 
    `postcode` varchar(9) NOT NULL, 
    `town_id` int(4) NOT NULL, 
    `lat` float NOT NULL, 
    `lng` float NOT NULL, 
    PRIMARY KEY (`id_postcodes`), 
    UNIQUE KEY `postcode` (`postcode`) USING BTREE, 
    KEY `town` (`town_id`) 
) ENGINE=MyISAM AUTO_INCREMENT=5705 DEFAULT CHARSET=latin1 

镇:

CREATE TABLE `towns` (
    `id_towns` int(4) NOT NULL AUTO_INCREMENT, 
    `town` varchar(150) NOT NULL, 
    `county_id` int(3) NOT NULL, 
    PRIMARY KEY (`id_towns`), 
    KEY `county` (`county_id`) 
) ENGINE=MyISAM AUTO_INCREMENT=1606 DEFAULT CHARSET=latin1 

这里是EXPLAIN EXTENDED输出:

1 PRIMARY <derived2>  ALL                              11  100 
1 PRIMARY <derived4>  ALL                              11  100  Using where; Using join buffer 
4 DERIVED <derived5>  ALL                              11  100  Using where; Using temporary; Using filesort 
4 DERIVED film_db   ref  postcode_id,closing_year,closing_month,closing_quarter closing_year 5 uk.closing_year      2  100  Using where 
4 DERIVED postcodes  eq_ref PRIMARY,town           PRIMARY   4 film_db.postcode_id     1  100 
4 DERIVED towns   eq_ref PRIMARY             PRIMARY   4 postcodes.town_id     1  100  Using where 
5 DERIVED film_db   ALL  closing_year,closing_month,closing_quarter                  9895680 47.66 Using where; Using temporary; Using filesort 
2 DERIVED <derived3>  ALL                              11  100  Using where; Using temporary; Using filesort 
2 DERIVED film_db   ref  opening_year,opening_postcode       opening_year 5 uk.opening_year      3  100  Using where 
2 DERIVED postcodes  eq_ref PRIMARY,town           PRIMARY   4 film_db.opening_postcode_id   1  100 
2 DERIVED towns   eq_ref PRIMARY             PRIMARY   4 postcodes.town_id     1  100  Using where 
3 DERIVED film_db   ALL  opening_year                         9895680 54.53 Using where; Using temporary; Using filesort 

正如你所看到的,MySQL不认为过滤的film_db表将使任何性能差异,所以它不使用任何键。

所以:

我可以提高此查询使用索引的更好?

我可以提高索引,使查询运行得更快?

是否有其他类型的数据库(MySQL的没有),我应该使用,而不是对这种查询的,在这里我在复杂的条件下计算的条目数最感兴趣并加入?

+0

这是什么?我建议你创建['sqlfiddle'](http://sqlfiddle.com)。 – diEcho

+0

我没有创建一个包含10000000行的sqlfiddle ...我只是试图提供所有我认为会有帮助的信息。 – Jon

+2

只需用上面的查询创建表格和必要的虚拟数据 – diEcho

回答

1

这是我想尝试的第一件事:

CREATE TABLE opens 
SELECT opening_year, COUNT(film_db.id_film_db) as number_of_films 
FROM film_db 
WHERE opening_year <= 2012 AND opening_year >= (2012 - 10) 
GROUP BY opening_year 

CREATE TABLE closures 
SELECT closing_year, COUNT(film_db.id_film_db) as number_of_films 
FROM film_db 
WHERE film_db.closing_year <= 2012 AND film_db.closing_year >= (2012 - 10) 
GROUP BY film_db.closing_year 

我会用的,而不是你现在正在使用的子查询这两个表。

其他查询做类似的事情,有时在查询结束时添加一个UNION ALL到一个查询,打开或关闭的记录年。 也有查询的月度数据和季度数据,而不是年度数据,以及其中一些只是比较历史打开/关闭特定的四分之一(例如Q3)或月(如3月),该运行。

我想你更频繁地运行这些选择,然后打开/关闭表的内容会改变。因此,每次运行这样的查询时都不需要重新生成这些表格。


我可以提高此查询使用索引的更好? 我可以改进索引以便查询运行得更快吗? 是否有另一种数据库类型(不是MySQL),我应该使用这种查询方式,而我最感兴趣的是计算具有复杂条件和联接的条目数量?

当然还有许多其他可能的改进。当然应该有一种方法让MySQL使用索引。您应该注意,数据库引擎不能合并单独的索引,也就是说,在这种情况下,opening_postcode_id上的索引和opening_year上的索引不能组合。我想不通为什么用都没有,但我可以肯定的告诉大家,像这两个指标将改善查询

KEY `opening_year_postcode` (`opening_year`, `opening_postcode_id`) 
KEY `closing_year_postcode` (`closing_year`, `postcode_id`) 

看到这个苏答案https://stackoverflow.com/a/6295744/176569


我学到了多年来,这种性能调整是一个渐进的过程。你必须尝试更多的技巧,评估性能增益,最后你将只应用一个或两个。

在这一点上,我不会考虑将MySQL放到其他数据库供应商。你的性能问题的原因可能不是MySQL。