我有两个表(产品和供应商),并且想要查明哪些项目不再列在供应商表中。高效的MySQL查询来查找A中不匹配的条目B
表uc_products有产品。表uc_supplier_csv有供应商库存。 uc_products.model加入uc_suppliers.sku。
当试图识别供应商表中未涉及的产品表中的库存时,我看到很长的查询。我只想提取匹配项的nid; sid IS NULL就是这样,我可以识别哪些项目没有供应商。
对于下面的第一个查询,每小时需要数据库服务器(4GB ram/2x 2.4GHz intel)才能得到结果(507行)。我没有等待第二个查询完成。
如何使此查询更优化?是否由于不匹配的字符集?
我在想,下面将是最有效的SQL使用:
SELECT nid, sid
FROM uc_products p
LEFT OUTER JOIN uc_supplier_csv c
ON p.model = c.sku
WHERE sid IS NULL ;
对于此查询,我得到以下EXPLAIN结果:
mysql> EXPLAIN SELECT nid, sid FROM uc_products p LEFT OUTER JOIN uc_supplier_csv c ON p.model = c.sku WHERE sid IS NULL;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
| 1 | SIMPLE | p | ALL | NULL | NULL | NULL | NULL | 6526 | |
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 126639 | Using where; Not exists |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
2 rows in set (0.00 sec)
我会认为密钥idx_sku和idx_model在这里可以使用,但它们不是。是因为表的默认字符集不匹配?一个是UTF-8,另一个是latin1。
我也被认为是这种形式:
SELECT nid
FROM uc_products
WHERE model
NOT IN (
SELECT DISTINCT sku FROM uc_supplier_csv
) ;
EXPLAIN显示了该查询的结果如下:
mysql> explain select nid from uc_products where model not in (select sku from uc_supplier_csv) ;
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| 1 | PRIMARY | uc_products | ALL | NULL | NULL | NULL | NULL | 6520 | Using where |
| 2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258 | NULL | 126639 | Using where; Using index |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)
而且,这样我就不会错过任何出,这里更多的是一些令人兴奋详细信息:表尺寸和统计,表结构:)
mysql> show table status where Name in ('uc_supplier_csv', 'uc_products') ;
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| uc_products | MyISAM | 10 | Dynamic | 6520 | 89 | 585796 | 281474976710655 | 232448 | 912 | NULL | 2009-04-24 11:03:15 | 2009-10-12 14:23:43 | 2009-04-24 11:03:16 | utf8_general_ci | NULL | | |
| uc_supplier_csv | MyISAM | 10 | Dynamic | 126639 | 26 | 3399704 | 281474976710655 | 5864448 | 0 | NULL | 2009-10-12 14:28:25 | 2009-10-12 14:28:25 | 2009-10-12 14:28:27 | latin1_swedish_ci | NULL | | |
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
和
CREATE TABLE `uc_products` (
`vid` mediumint(9) NOT NULL default '0',
`nid` mediumint(9) NOT NULL default '0',
`model` varchar(255) NOT NULL default '',
`list_price` decimal(10,2) NOT NULL default '0.00',
`cost` decimal(10,2) NOT NULL default '0.00',
`sell_price` decimal(10,2) NOT NULL default '0.00',
`weight` float NOT NULL default '0',
`weight_units` varchar(255) NOT NULL default 'lb',
`length` float unsigned NOT NULL default '0',
`width` float unsigned NOT NULL default '0',
`height` float unsigned NOT NULL default '0',
`length_units` varchar(255) NOT NULL default 'in',
`pkg_qty` smallint(5) unsigned NOT NULL default '1',
`default_qty` smallint(5) unsigned NOT NULL default '1',
`unique_hash` varchar(32) NOT NULL,
`ordering` tinyint(2) NOT NULL default '0',
`shippable` tinyint(2) NOT NULL default '1',
PRIMARY KEY (`vid`),
KEY `idx_model` (`model`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `uc_supplier_csv` (
`sid` int(10) unsigned NOT NULL default '0',
`sku` varchar(255) default NULL,
`stock` int(10) unsigned NOT NULL default '0',
`list_price` decimal(8,2) default '0.00',
KEY `idx_sku` (`sku`),
KEY `idx_stock` (`stock`),
KEY `idx_sku_stock` (`sku`,`stock`),
KEY `idx_sid` (`sid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
编辑:从马丁下面几个建议的查询添加查询计划:
mysql> explain SELECT nid FROM uc_products p WHERE NOT EXISTS (SELECT 1 FROM uc_supplier_csv c WHERE p.model = c.sku) ;
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| 1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 6526 | Using where |
| 2 | DEPENDENT SUBQUERY | c | index | NULL | idx_sku | 258 | NULL | 126639 | Using where; Using index |
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)
mysql> explain SELECT nid FROM uc_products WHERE model NOT IN (SELECT sku FROM uc_supplier_csv) ;
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| 1 | PRIMARY | uc_products | ALL | NULL | NULL | NULL | NULL | 6526 | Using where |
| 2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258 | NULL | 126639 | Using where; Using index |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)
您使用在第一个查询为是不正确 - 因为没有GROUP BY,它应该是一个简单的哪里。不知道为什么MySQL不给你一个错误消息,但我想这就是搞砸了查询计划! – 2009-10-12 04:28:58
谢谢亚历克斯 - 更新 – 2009-10-12 09:01:13
我昨天在我的笔记本电脑上测试了这个页面上的四个查询表单(MBP2.4GHz/4GB/OSX/MAMP MySQL)。 *上面的LEFT OUTER JOIN表单需要3526s才能执行。 *上面的子查询表格执行了1021s。 *马丁的建议下面花了637s执行。 *詹姆斯的速度比马丁的速度略快,但是与其他三种形式的结果不同。 – 2009-10-12 20:00:49