2015-06-18 29 views
0

我有一个关于要执行的查询的问题,但我不知道什么是最好的性能。我需要将所有单词排除在与表单词过滤器关联的单词之外。查询性能 - '左连接为空'vs'不存在选择'

查询的输出是正确的,但也许有更好的解决方案。我几乎没有关于查询计划的知识,现在我正试图理解它。

SELECT CONCAT(SPACE(1), UCASE(stocknews.word.word), SPACE(1)) AS word, stocknews.word.language 
FROM stocknews.word 
WHERE NOT EXISTS (SELECT word_id FROM stocknews.wordfilter WHERE stocknews.word.id = word_id) 
AND user_id = 1 

+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+ 
| id | select_type | table  | type | possible_keys | key  | key_len | ref | rows | extra  | 
+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+ 
| 1 | PRIMARY  | word  | ref | user_id  | user_id | 4  | const | 843 | Using where | 
| 2 | MATERIALIZED | wordfilter | index | PRIMARY  | PRIMARY | 756  |  | 16 | Using index | 
+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+ 

反对

SELECT CONCAT(SPACE(1), UCASE(stocknews.word.word), SPACE(1)) AS word, stocknews.word.language 
FROM stocknews.word 
LEFT JOIN stocknews.wordfilter ON stocknews.word.id = stocknews.wordfilter.word_id 
WHERE stocknews.wordfilter.word_id IS NULL AND user_id = 1 

+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+ 
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | extra        | 
+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+ 
| 1 | SIMPLE  | word  | ref | user_id  | user_id | 4  | const | 843 |          | 
| 1 | SIMPLE  | wordfilter | ref | PRIMARY  | PRIMARY | 4  | word.id | 1 | Using where; Using index; Not exists | 
+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+ 

欢迎任何帮助!解释会很好。

编辑:

对于查询1:

+----------------------------+-------+ 
| Variable_name    | Value | 
+----------------------------+-------+ 
| Handler_commit    | 1  | 
| Handler_delete    | 0  | 
| Handler_discover   | 0  | 
| Handler_external_lock  | 0  | 
| Handler_icp_attempts  | 0  | 
| Handler_icp_match   | 0  | 
| Handler_mrr_init   | 0  | 
| Handler_mrr_key_refills | 0  | 
| Handler_mrr_rowid_refills | 0  | 
| Handler_prepare   | 0  | 
| Handler_read_first   | 1  | 
| Handler_read_key   | 1044 | 
| Handler_read_last   | 0  | 
| Handler_read_next   | 859 | 
| Handler_read_prev   | 0  | 
| Handler_read_rnd   | 0  | 
| Handler_read_rnd_deleted | 0  | 
| Handler_read_rnd_next  | 0  | 
| Handler_rollback   | 0  | 
| Handler_savepoint   | 0  | 
| Handler_savepoint_rollback | 0  | 
| Handler_tmp_update   | 0  | 
| Handler_tmp_write   | 215 | 
| Handler_update    | 0  | 
| Handler_write    | 0  | 
+----------------------------+-------+ 
25 rows in set (0.00 sec) 

对于查询2:

+----------------------------+-------+ 
| Variable_name    | Value | 
+----------------------------+-------+ 
| Handler_commit    | 1  | 
| Handler_delete    | 0  | 
| Handler_discover   | 0  | 
| Handler_external_lock  | 0  | 
| Handler_icp_attempts  | 0  | 
| Handler_icp_match   | 0  | 
| Handler_mrr_init   | 0  | 
| Handler_mrr_key_refills | 0  | 
| Handler_mrr_rowid_refills | 0  | 
| Handler_prepare   | 0  | 
| Handler_read_first   | 0  | 
| Handler_read_key   | 844 | 
| Handler_read_last   | 0  | 
| Handler_read_next   | 843 | 
| Handler_read_prev   | 0  | 
| Handler_read_rnd   | 0  | 
| Handler_read_rnd_deleted | 0  | 
| Handler_read_rnd_next  | 0  | 
| Handler_rollback   | 0  | 
| Handler_savepoint   | 0  | 
| Handler_savepoint_rollback | 0  | 
| Handler_tmp_update   | 0  | 
| Handler_tmp_write   | 0  | 
| Handler_update    | 0  | 
| Handler_write    | 0  | 
+----------------------------+-------+ 
+0

您是否尝试过并且比较了性能?一般来说(至少在SQL Server中,不确定MariaDB),如果你的表正确索引,'EXISTS'和'NOT EXISTS'执行得更快,因为它们是短路操作。一旦找到记录,就会根据您的查询将其排除或包含在内。 'LEFT JOIN'包括所有的记录,并且以'IS NULL'标准结尾进行过滤。 –

+0

我在一个小数据集上尝试了这两种方法,两者都有相同的结果。感谢您提供的信息,我几乎可以确定我的索引是否正确设置。我想我需要制作一个更大的数据集来看看有什么不同。我的主要问题是听到查询使其更快或更慢。所以我猜'NOT EXISTS'方法更快。我将用更大的数据集对其进行测试,并报告实际查询的速度。谢谢 –

+0

当_word_中有13.000条记录和_wordfilter_中有5000条记录时,'NOT EXISTS'方法似乎更快了@ –

回答

0

这似乎是两种制剂之间紧密的赛程。 (其他示例可能会显示更清晰的赢家。)

从HANDLER值中查询1执行了更多的read_keys和一些写操作(与MATERIALIZED配合使用)。其他数字大致相同。因此,我得出结论:查询1速度较慢 - 尽管可能不足以产生很大的差异。

我投给LEFT JOIN作为更好的查询模式(在这种情况下)