四时宝库

程序员的知识宝库

你真的了解mysql数据库对like语句处理过程吗

用过sql的同学基本都会过like,但是大家对like了解多少,很多同学可能认知在like条件,如果第一个字符为通配符,sql语句就不会走索引,如果不为通配符,sql语句就会走索引,真相真的是这样的吗,我用实际测试案例来说明。

在mysql5.7数据库上我用sysbench做了一张5000w记录的表来模拟测试,表结构如下

Create Table: CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) COLLATE utf8_bin NOT NULL DEFAULT '',
  `pad` char(60) COLLATE utf8_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`),
  KEY `idx_pad` (`pad`)
) ENGINE=InnoDB AUTO_INCREMENT=50000001 DEFAULT CHARSET=utf8 COLLATE=utf8_bin 

like语句基本有如下三种写法

select count(*) from sbtest1 where pad like '00000005198-22795060602-81061358729%';
select count(*) from sbtest1 where pad like '%00000005198-22795060602-81061358729%';
select count(*) from sbtest1 where pad like '%00000005198-%22795060602-81061358729%';

那我们来看看在数据中这三条语句的执行计划是怎样的

mysql> explain select count(*) from sbtest1 where pad like '00000005198-22795060602-81061358729%';
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | sbtest1 | NULL       | range | idx_pad       | idx_pad | 180     | NULL |    1 |   100.00 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from sbtest1 where pad like '%00000005198-22795060602-81061358729%';
+----+-------------+---------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref  | rows     | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
|  1 | SIMPLE      | sbtest1 | NULL       | index | NULL          | idx_pad | 180     | NULL | 49315602 |    11.11 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from sbtest1 where pad like '%00000005198-%22795060602-81061358729%';
+----+-------------+---------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref  | rows     | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
|  1 | SIMPLE      | sbtest1 | NULL       | index | NULL          | idx_pad | 180     | NULL | 49315602 |    11.11 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

到这里,是不是颠覆了你的认知,三个写法竟然都是走索引的。这个时候,一定要保持冷静,多问自己几个为什么,为什么会走索引,三个语句的执行计划一样,执行时间是一样的吗。

三种写法,执行计划都是一样,这是因为mysql优化引擎做了很多事情,那三个sql执行时间一样吗,我们来实际测试一下

mysql> select count(*) from sbtest1 where pad like '00000005198-22795060602-81061358729%';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest1 where pad like '%00000005198-22795060602-81061358729%';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (46.04 sec)

mysql> select count(*) from sbtest1 where pad like '%00000005198-22795060602%-81061358729%';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (45.94 sec)

从上面结果,可以很明显的看出第一个字符不是通配符的like查询语句效率最高。这是因为第一种写法,mysql引擎只需要对b+tree进行range的范围扫描即可,而第二种和第三种写法,需要将b+tree的节点整个扫描完,才能得到需要的数据,所以时间相差才会这么大。

大家在学习技术或者看技术文章的时候,一定要自己动手去验证,如果发现测试结果不一样,一定要去思考,这样才会走的更远。

发表评论:

控制面板
您好,欢迎到访网站!
  查看权限
网站分类
最新留言
    友情链接