2012-10-24 63 views
0

哪个select语句更好?更快地做'WHERE IN(SELECT)'或'WHERE x =(SELECT)'

SELECT * 
FROM aTable 
WHERE aField in (
    SELECT xField 
    FROM bTable 
    WHERE yField > 5 
); 

OR

SELECT * 
FROM aTable 
WHERE (
    SELECT yField 
    FROM bTable 
    WHERE aTable.aField = bTable.xField 
) > 5; 
+0

是第二个甚至法律的语法?通常以'WHERE xxx IN(...)'开头的问题与'OUTER JOIN'和'WHERE yyy IS NULL'或'NOT EXISTS(...))'。如果这是你的问题,[这是你的答案](http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/)。 –

+0

我的看法中的第一个更坚实。我很肯定你会从第二个错误中得到答案。 –

+0

回答我上面的问题:是的,显然第二个例子是有效的。没有得到答复的礼貌,但我自己用本地MySQL安装进行了检查。 –

回答

3

他们产生非常类似的执行计划(在我的测试表,这是微小的,因人而异,总是轮廓真实数据),还有就是你可能要考虑,而不是第三种选择:

第一:

EXPLAIN SELECT * FROM aTable WHERE aField in (SELECT xField FROM bTable WHERE yField > 5); 
+----+--------------------+--------+-------+---------------+---------------+---------+------+------+-------------+ 
| id | select_type  | table | type | possible_keys | key   | key_len | ref | rows | Extra  | 
+----+--------------------+--------+-------+---------------+---------------+---------+------+------+-------------+ 
| 1 | PRIMARY   | aTable | ALL | NULL   | NULL   | NULL | NULL | 4 | Using where | 
| 2 | DEPENDENT SUBQUERY | bTable | range | bTable_yField | bTable_yField | 5  | NULL | 2 | Using where | 
+----+--------------------+--------+-------+---------------+---------------+---------+------+------+-------------+

第二种:

EXPLAIN SELECT * FROM aTable WHERE (SELECT yField FROM bTable WHERE aTable.aField = bTable.xField) > 5; 
+----+--------------------+--------+------+---------------+------+---------+------+------+-------------+ 
| id | select_type  | table | type | possible_keys | key | key_len | ref | rows | Extra  | 
+----+--------------------+--------+------+---------------+------+---------+------+------+-------------+ 
| 1 | PRIMARY   | aTable | ALL | NULL   | NULL | NULL | NULL | 4 | Using where | 
| 2 | DEPENDENT SUBQUERY | bTable | ALL | NULL   | NULL | NULL | NULL | 4 | Using where | 
+----+--------------------+--------+------+---------------+------+---------+------+------+-------------+

两者都导致依赖子查询;在我的示例表中,第一个获得索引的好处(我假设bTable.yField已编入索引),而第二个则没有。

可以使用避免依赖子查询并获得更好的前期筛选一个JOIN

第三种选择:

EXPLAIN SELECT * FROM aTable INNER JOIN bTable On aTable.aField = bTable.xField WHERE bTable.yField > 5; 
+----+-------------+--------+-------+---------------+---------------+---------+------+------+--------------------------------+ 
| id | select_type | table | type | possible_keys | key   | key_len | ref | rows | Extra       | 
+----+-------------+--------+-------+---------------+---------------+---------+------+------+--------------------------------+ 
| 1 | SIMPLE  | bTable | range | bTable_yField | bTable_yField | 5  | NULL | 2 | Using where     | 
| 1 | SIMPLE  | aTable | ALL | NULL   | NULL   | NULL | NULL | 4 | Using where; Using join buffer | 
+----+-------------+--------+-------+---------------+---------------+---------+------+------+--------------------------------+

此外,虽然,你真的有个人资料您的架构和您的具有代表性的真实世界的数据,因为优化器可能做出不同的决定。

更多在this excellent articlequassnoi比较这些种类的技术。


供参考,这是我如何创建aTablebTable(因为你没有提供的定义)和测试您的疑问:

mysql> CREATE TABLE aTable (aField INT, aMore VARCHAR(200)); 
Query OK, 0 rows affected (0.01 sec) 

mysql> CREATE TABLE bTable (xField INT, yField INT); 
Query OK, 0 rows affected (0.02 sec) 

mysql> INSERT INTO aTable (aField, aMore) VALUES (1, 'One'), (2, 'Two'), (3, 'Three'), (4, 'Four'); 
Query OK, 4 rows affected (0.00 sec) 
Records: 4 Duplicates: 0 Warnings: 0 

mysql> INSERT INTO bTable (xField, yField) VALUES (1, 10), (2, 2), (3, 20), (4, 4); 
Query OK, 4 rows affected (0.02 sec) 
Records: 4 Duplicates: 0 Warnings: 0 

mysql> CREATE INDEX bTable_yField ON bTable(yField); 
Query OK, 0 rows affected (0.05 sec) 
Records: 0 Duplicates: 0 Warnings: 0 

mysql> SELECT * FROM aTable WHERE aField in (SELECT xField FROM bTable WHERE yField > 5); 
+--------+-------+ 
| aField | aMore | 
+--------+-------+ 
|  1 | One | 
|  3 | Three | 
+--------+-------+ 
2 rows in set (0.00 sec) 

mysql> SELECT * FROM aTable WHERE (SELECT yField FROM bTable WHERE aTable.aField = bTable.xField) > 5; 
+--------+-------+ 
| aField | aMore | 
+--------+-------+ 
|  1 | One | 
|  3 | Three | 
+--------+-------+ 
2 rows in set (0.00 sec)
1

我觉得第二个转换为相关子查询语义,因此是昂贵的,相比于第一个。最好是只连接两个表,如下所示:

SELECT 
    a.* 
FROM 
    aTable a 
    JOIN bTable b 
    ON aTable.aField = bTable.xField 
WHERE 
    b.xField > 5 

这将节省您大量的IN子句中的结果,第一个查询的情况下,这将使查询执行慢,有时会导致溢出错误(SQL Server在IN子句中用来引发溢出错误的限制为32767个值)。

+0

这两个OP的例子都需要依赖子查询。 +1用于指出'JOIN'选项。 –

0

很大程度上取决于表的索引和是否在连接条件中使用索引列。这些组合将以某种方式决定SQL引擎如何“决定”在内部构造查询并最终影响查询性能。不太确定MySQL,但肯定SQL Server将允许创建执行计划,这将显示潜在的瓶颈。