索引跳跃扫描-技术教程-四时宝库

在前面的复合索引文章里讲到过，复合索引要满足最左前缀匹配原则，即不能跳过索引的前导列而单独使用索引的后面的列。

如果有查询复合索引(a,b)，查询为select * from t where b=xx，此查询是否一定不能走索引呢？

表结构如下：

mysql> desc emps;
+------------+---------------+------+-----+---------+----------------+
| Field      | Type          | Null | Key | Default | Extra          |
+------------+---------------+------+-----+---------+----------------+
| id         | int           | NO   | PRI | NULL    | auto_increment |
| emp_no     | int           | NO   |     | NULL    |                |
| birth_date | date          | NO   |     | NULL    |                |
| first_name | varchar(14)   | NO   | MUL | NULL    |                |
| last_name  | varchar(16)   | NO   | MUL | NULL    |                |
| gender     | enum('m','f') | NO   |     | NULL    |                |
| hire_date  | date          | NO   |     | NULL    |                |
+------------+---------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)

在(gender,first_name)建组合索引

mysql> create index gf_ck on emps(gender,first_name);
Query OK, 0 rows affected (2.08 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show indexes from emps;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| emps  |          0 | PRIMARY  |            1 | id          | A         |      299150 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| emps  |          1 | gf_ck    |            1 | gender      | A         |           1 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| emps  |          1 | gf_ck    |            2 | first_name  | A         |        2476 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
3 rows in set (0.01 sec)

查看select gender,first_name from emps where first_name='Anneke';的执行计划

mysql> desc select gender,first_name from emps where first_name='Anneke';
+----+-------------+-------+------------+-------+---------------+-------+---------+------+-------+----------+----------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key   | key_len | ref  | rows  | filtered | Extra                                  |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+-------+----------+----------------------------------------+
|  1 | SIMPLE      | emps  | NULL       | range | gf_ck         | gf_ck | 59      | NULL | 29915 |   100.00 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+-------+----------+----------------------------------------+
1 row in set, 1 warning (0.00 sec)

看到查询走了复合索引。这个查询违反了上面所说的最左前缀原则确走了索引，是不是最左前缀原则是错的呢？答案是否定的。

MySQL中有一种叫做索引跳跃扫描的优化，当查询条件中跳过了前导列，使用了后面的列时，如果跳过的前导列的唯一值不多时，查询是可以走复合索引的。如上面的示例。

索引跳跃扫描原理：

1.将查询按照前导列拆分成多个select组，让后将每个组的where条件中加上跳过的前导列=值条件，上面的示例会拆成如下

select gender,first_name from emps where gender='F' and first_name='Anneke';
union all
select gender,first_name from emps where gender='M'  first_name='Anneke';

2.使用union all将所有拆分的select组的结果集联合起来，最后将整个联合的结果集作为最终的结果集返回给客户端。

总结：

索引跳跃扫描并没有违反复合索引的最左前缀原则，只是优化器使用了一种巧妙的方式让用户误以为跳过了前导列。能使用索引跳跃扫描的前提是跳过的前导列唯一值不多。如上面的示例查询gender列只有'F'和'M'两个值。

四时宝库

程序员的知识宝库

索引跳跃扫描