当查询结果在一定范围内,即有多条数据匹配查询条件时称为范围查询。唯一查询可以看成是特殊的范围查询,范围内只有一条数据。
通常where条件中使用了=, >, <, >=, <=, between, !=, <> , like等操作符。
在范围查询时,发现同样的sql语句,只是查询的条件不同,有的查询能走索引,有的查询确不能走索引。
例如有如下表emps,表中有300024行数据,字段emp_no上有唯一索引
mysql> desc emps;
+------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| emp_no | int | YES | | NULL | |
| birth_date | date | YES | | NULL | |
| first_name | varchar(14) | YES | | NULL | |
| last_name | varchar(16) | YES | | NULL | |
| gender | enum('M','F') | YES | | NULL | |
| hire_date | date | YES | | NULL | |
+------------+---------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
mysql> show table status like 'emps'\G
*************************** 1. row ***************************
Name: emps
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 299440
Avg_row_length: 54
Data_length: 16269312
Max_data_length: 0
Index_length: 5783552
Data_free: 2097152
Auto_increment: 327676
Create_time: 2023-05-26 12:09:47
Update_time: 2023-05-26 12:07:09
Check_time: NULL
Collation: utf8mb4_0900_ai_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
mysql> select count(1) from emps;
+----------+
| count(1) |
+----------+
| 300024 |
+----------+
1 row in set (0.01 sec)
mysql> show indexes from emps;
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| emps | 0 | PRIMARY | 1 | id | A | 299341 | NULL | NULL | | BTREE | | | YES | NULL |
| emps | 0 | emp_no_uk | 1 | emp_no | A | 299341 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.00 sec)
以emp_no为条件执行如下查询
mysql> desc select * from emps e where e.emp_no <=10000;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | e | NULL | range | emp_no_uk | emp_no_uk | 5 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
查询使用到了emp_no上的唯一索引
再执行如下查询,仅是查询条件的值不同
mysql> desc select * from emps e where e.emp_no <=54000;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | e | NULL | ALL | emp_no_uk | NULL | NULL | NULL | 299440 | 29.24 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
发现这条sql没有使用到索引,走的是全表扫描。
同样的sql语句,只是查询范围不同,导致执行计划不同,因此查询的效率也不同。为什么会造成如此差异呢?MySQL的优化器是CBO,即基于成本的优化器。优化器在选择执行计划时会评估各种执行计划的成本,从中选择一条成本最低的作为最终的执行计划。
1.先看使用索引的执行路径
先扫描emp_no_uk索引树,选出满足条件的叶子节点。
再根据选出的叶子扫描主键树,获取最终的数据
总的成本为扫描emp_no_uk的成本+扫描主键树的成本
2.全表扫描的执行路径
全表扫描仅需要扫描一棵主键树即可。由于主键索引不是按照emp_no排列的,因此没法知道哪些叶子节点中包含了emp_no,哪些没有。全表扫描需要扫描所有的叶子节点。
全表扫描的成本为根节点+一个分支节点+所有的叶子节点
预估全表扫描的成本为30178
mysql> desc format=tree select * from emps e where e.emp_no <=54000;
+-----------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------+
| -> Filter: (e.emp_no <= 54000) (cost=30178 rows=99755)
-> Table scan on e (cost=30178 rows=299294)
|
+-----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
预估的使用索引的成本为40075
mysql> desc format=tree select * from emps e force index(emp_no_uk) where e.emp_no <=54000;
+--------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------+
| -> Index range scan on e using emp_no_uk over (emp_no <= 54000), with index condition: (e.emp_no <= 54000) (cost=40075 rows=89054)
|
+--------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
全表扫描的成本小于使用索引的成本,因此MySQL选择了全表扫描作为最终的执行计划。
结论:
如果查询的范围占整个表数据量的比率超过一定范围时,即使表上有索引也不会使用索引。
可以通过拆分范围分批次执行让sql能够走索引提高查询效率。