摘要
的问题是,field
不是索引一个很好的候选人,由于b-trees性质。
说明
让我们假设你有一个具有50万次掷硬币,其中折腾或者是1
(头)或0
(尾)结果的表格:
CREATE TABLE toss (
id int NOT NULL AUTO_INCREMENT,
result int NOT NULL DEFAULT '0',
PRIMARY KEY (id)
)
select result, count(*) from toss group by result order by result;
+--------+----------+
| result | count(*) |
+--------+----------+
| 0 | 250290 |
| 1 | 249710 |
+--------+----------+
2 rows in set (0.40 sec)
如果您想要选择一个折腾(随机)抛掷尾巴,然后你需要搜索你的桌子,选择一个随机的起点。
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.06 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | ALL | NULL | NULL | NULL | NULL | 500000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
你会发现你基本上是按顺序搜索所有行来找到一个匹配。
如果您在toss
字段上创建索引,那么您的索引将包含两个值,每个值包含大约250,000个条目。
create index foo on toss (result);
Query OK, 500000 rows affected (2.48 sec)
Records: 500000 Duplicates: 0 Warnings: 0
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.25 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | range | foo | foo | 4 | NULL | 154565 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
现在您正在寻找更少的记录,但搜索时间从0.06秒增加到了0.25秒。为什么?由于顺序扫描索引实际上比顺序扫描表的效率低,对于给定键具有大量行的索引。
让我们看看在这个表上的索引:
show index from toss;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| toss | 0 | PRIMARY | 1 | id | A | 500000 | NULL | NULL | | BTREE | |
| toss | 1 | foo | 1 | result | A | 2 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
主索引是一个很好的指标:有50万行数据,并有50万点的值。在BTREE中安排,您可以根据ID快速识别单个行。
foo索引是一个糟糕的索引:有500,000行,但只有2个可能的值。对于BTREE来说,这几乎是最糟糕的情况 - 所有搜索索引的开销,并且仍然需要搜索结果。
它是什么引擎?显示EXPLAIN – Mchl 2010-07-30 22:41:05