2009-10-12 99 views
1

我有两个表(产品和供应商),并且想要查明哪些项目不再列在供应商表中。高效的MySQL查询来查找A中不匹配的条目B

表uc_products有产品。表uc_supplier_csv有供应商库存。 uc_products.model加入uc_suppliers.sku。

当试图识别供应商表中未涉及的产品表中的库存时,我看到很长的查询。我只想提取匹配项的nid; sid IS NULL就是这样,我可以识别哪些项目没有供应商。

对于下面的第一个查询,每小时需要数据库服务器(4GB ram/2x 2.4GHz intel)才能得到结果(507行)。我没有等待第二个查询完成。

如何使此查询更优化?是否由于不匹配的字符集?

我在想,下面将是最有效的SQL使用:

  SELECT nid, sid 
      FROM uc_products p 
LEFT OUTER JOIN uc_supplier_csv c 
      ON p.model = c.sku 
     WHERE sid IS NULL ; 

对于此查询,我得到以下EXPLAIN结果:

mysql> EXPLAIN SELECT nid, sid FROM uc_products p LEFT OUTER JOIN uc_supplier_csv c ON p.model = c.sku WHERE sid IS NULL; 
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+ 
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra     | 
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+ 
| 1 | SIMPLE  | p  | ALL | NULL   | NULL | NULL | NULL | 6526 |       | 
| 1 | SIMPLE  | c  | ALL | NULL   | NULL | NULL | NULL | 126639 | Using where; Not exists | 
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+ 
2 rows in set (0.00 sec) 

我会认为密钥idx_sku和idx_model在这里可以使用,但它们不是。是因为表的默认字符集不匹配?一个是UTF-8,另一个是latin1。

我也被认为是这种形式:

SELECT nid 
    FROM uc_products 
WHERE model 
NOT IN ( 
     SELECT DISTINCT sku FROM uc_supplier_csv 
     ) ; 

EXPLAIN显示了该查询的结果如下:

mysql> explain select nid from uc_products where model not in (select sku from uc_supplier_csv) ; 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
| id | select_type  | table   | type | possible_keys   | key  | key_len | ref | rows | Extra     | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
| 1 | PRIMARY   | uc_products  | ALL | NULL     | NULL | NULL | NULL | 6520 | Using where    | 
| 2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258  | NULL | 126639 | Using where; Using index | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
2 rows in set (0.00 sec) 

而且,这样我就不会错过任何出,这里更多的是一些令人兴奋详细信息:表尺寸和统计,表结构:)

mysql> show table status where Name in ('uc_supplier_csv', 'uc_products') ; 
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+ 
| Name   | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time   | Update_time   | Check_time   | Collation   | Checksum | Create_options | Comment | 
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+ 
| uc_products  | MyISAM |  10 | Dynamic | 6520 |    89 |  585796 | 281474976710655 |  232448 |  912 |   NULL | 2009-04-24 11:03:15 | 2009-10-12 14:23:43 | 2009-04-24 11:03:16 | utf8_general_ci |  NULL |    |   | 
| uc_supplier_csv | MyISAM |  10 | Dynamic | 126639 |    26 |  3399704 | 281474976710655 |  5864448 |   0 |   NULL | 2009-10-12 14:28:25 | 2009-10-12 14:28:25 | 2009-10-12 14:28:27 | latin1_swedish_ci |  NULL |    |   | 
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+ 

CREATE TABLE `uc_products` (
    `vid` mediumint(9) NOT NULL default '0', 
    `nid` mediumint(9) NOT NULL default '0', 
    `model` varchar(255) NOT NULL default '', 
    `list_price` decimal(10,2) NOT NULL default '0.00', 
    `cost` decimal(10,2) NOT NULL default '0.00', 
    `sell_price` decimal(10,2) NOT NULL default '0.00', 
    `weight` float NOT NULL default '0', 
    `weight_units` varchar(255) NOT NULL default 'lb', 
    `length` float unsigned NOT NULL default '0', 
    `width` float unsigned NOT NULL default '0', 
    `height` float unsigned NOT NULL default '0', 
    `length_units` varchar(255) NOT NULL default 'in', 
    `pkg_qty` smallint(5) unsigned NOT NULL default '1', 
    `default_qty` smallint(5) unsigned NOT NULL default '1', 
    `unique_hash` varchar(32) NOT NULL, 
    `ordering` tinyint(2) NOT NULL default '0', 
    `shippable` tinyint(2) NOT NULL default '1', 
    PRIMARY KEY (`vid`), 
    KEY `idx_model` (`model`) 
) ENGINE=MyISAM DEFAULT CHARSET=utf8 

CREATE TABLE `uc_supplier_csv` (
    `sid` int(10) unsigned NOT NULL default '0', 
    `sku` varchar(255) default NULL, 
    `stock` int(10) unsigned NOT NULL default '0', 
    `list_price` decimal(8,2) default '0.00', 
    KEY `idx_sku` (`sku`), 
    KEY `idx_stock` (`stock`), 
    KEY `idx_sku_stock` (`sku`,`stock`), 
    KEY `idx_sid` (`sid`) 
) ENGINE=MyISAM DEFAULT CHARSET=latin1 

编辑:从马丁下面几个建议的查询添加查询计划:

mysql> explain SELECT nid FROM uc_products p WHERE NOT EXISTS (SELECT 1 FROM uc_supplier_csv c WHERE p.model = c.sku) ; 
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ 
| id | select_type  | table | type | possible_keys | key  | key_len | ref | rows | Extra     | 
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ 
| 1 | PRIMARY   | p  | ALL | NULL   | NULL | NULL | NULL | 6526 | Using where    | 
| 2 | DEPENDENT SUBQUERY | c  | index | NULL   | idx_sku | 258  | NULL | 126639 | Using where; Using index | 
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ 
2 rows in set (0.00 sec) 

mysql> explain SELECT nid FROM uc_products WHERE model NOT IN (SELECT sku FROM uc_supplier_csv) ; 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
| id | select_type  | table   | type | possible_keys   | key  | key_len | ref | rows | Extra     | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
| 1 | PRIMARY   | uc_products  | ALL | NULL     | NULL | NULL | NULL | 6526 | Using where    | 
| 2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258  | NULL | 126639 | Using where; Using index | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
2 rows in set (0.00 sec) 
+2

您使用在第一个查询为是不正确 - 因为没有GROUP BY,它应该是一个简单的哪里。不知道为什么MySQL不给你一个错误消息,但我想这就是搞砸了查询计划! – 2009-10-12 04:28:58

+0

谢谢亚历克斯 - 更新 – 2009-10-12 09:01:13

+0

我昨天在我的笔记本电脑上测试了这个页面上的四个查询表单(MBP2.4GHz/4GB/OSX/MAMP MySQL)。 *上面的LEFT OUTER JOIN表单需要3526s才能执行。 *上面的子查询表格执行了1021s。 *马丁的建议下面花了637s执行。 *詹姆斯的速度比马丁的速度略快,但是与其他三种形式的结果不同。 – 2009-10-12 20:00:49

回答

3

也许尝试使用NOT EXISTS而不是计数?例如:

SELECT nid 
    FROM uc_products p 
WHERE NOT EXISTS ( 
     SELECT 1 
     FROM uc_supplier_csv c 
     WHERE p.model = c.sku 
     ) 

SO用户Quassnoi有short article概述了一些测试,认为这也可能是值得一试:

SELECT nid 
    FROM uc_products 
WHERE model NOT IN ( 
     SELECT sku 
     FROM uc_supplier_csv 
     ) 

基本上按你原来的查询,没有区别。

另一个用于您克里斯,这个时间与编码交叉的帮助下加入:

SELECT nid 
    FROM uc_products p 
WHERE NOT EXISTS (
     SELECT 1 
     FROM uc_supplier_csv c 
     WHERE CONVERT(p.model USING latin1) = c.sku 
     ) 
+0

此查询是返回正确结果的最快建议解决方案。执行了637秒。 – 2009-10-12 19:56:48

+0

查询计划是什么样的? – 2009-10-12 21:03:22

+0

加入问题(格式不适用于我的评论?) – 2009-10-13 04:38:38