MySQL 查询优化

栏目: 数据库 · Mysql · 发布时间: 6年前

内容简介:MySQL版本:建表语句:为用户表建立索引:

MySQL版本:

SELECT VERSION();
+------------+
| VERSION()  |
+------------+
| 5.7.21-log |
+------------+

建表语句:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `agent_id` int(11) DEFAULT NULL COMMENT '代理商ID',
  `name` varchar(16) DEFAULT NULL COMMENT '姓名',
  `score` int(11) DEFAULT '0' COMMENT '积分',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='用户表';
CREATE TABLE `agent` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `name` varchar(16) DEFAULT NULL COMMENT '名称',
  `level` int(11) DEFAULT NULL COMMENT '级别',
  PRIMARY KEY (`id`),
  KEY `idx_level` (`level`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='代理商';

为用户表建立索引:

CREATE INDEX idx_agentid ON user(agent_id);

为用户表建立索引:

CREATE INDEX idx_score_name ON user(score, name);

为代理商建立索引:

CREATE INDEX idx_level ON agent(level);

用户表索引列表:

SHOW INDEX FROM user;
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| user  |          0 | PRIMARY        |            1 | id          | A         |    19285676 |     NULL | NULL   |      | BTREE      |         |               |
| user  |          1 | idx_agentid    |            1 | agent_id    | A         |        5952 |     NULL | NULL   | YES  | BTREE      |         |               |
| user  |          1 | idx_score_name |            1 | score       | A         |       28654 |     NULL | NULL   | YES  | BTREE      |         |               |
| user  |          1 | idx_score_name |            2 | name        | A         |    19970922 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

代理商索引列表:

SHOW INDEX FROM agent;
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| agent |          0 | PRIMARY   |            1 | id          | A         |           5 |     NULL | NULL   |      | BTREE      |         |               |
| agent |          1 | idx_level |            1 | level       | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

1. ORDER BY 优化

MySLQ官方文档 给出了能够使用索引加速 ORDER BY 排序以及无法通过使用索引加速 ORDER BY 排序的场景案例,下面来一一列举。

1.1 使用索引加速排序

1.1.1 案例1

使用复合索引(多列索引)中的一个或多个列进行排序。要遵循最左前缀匹配原则。如果是使用复合索引中的多个列进行排序,这些列 排序 的顺序要么都是升序,要么都是降序,否则无法使用索引加速排序操作。

SELECT * FROM t1 ORDER BY key_part1, key_part2;
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 DESC;
EXPLAIN SELECT * FROM user ORDER BY score, name;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------+
|  1 | SIMPLE      | user  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 20434945 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------+

从执行计划的结果来看, Extra 出现了 Using filesort ,这就是通常说的 Filesort 排序。

Filesort 排序并不代表是通过磁盘文件进行的排序,而只是说数据库服务器需要对数据进行一次额外的排序。凡是不是通过索引直接返回排序结果的排序,都叫 Filesort 排序。

Filesort 排序算法首先是尝试将取出的数据一次性加载到内存中进行排序,排序最大能使用的内存空间大小是由系统变量 sort_buffer_size 决定的。如果内存空间无法容下所有取出的数据,那么排序就会分解成多个更小的排序,然后每次只排序其中一小部分的数据,并将每次排序的结果存储到磁盘临时文件中,最后再将临时文件中的数据进行一次排序和合并结果输出。

每将各个有序的小数据块合并成一个有序的结果集就会增加 Sort_merge_passes 的数值。因此,如果数据库服务器的 Sort_merge_passes 的数值过大,可以考虑适当增加 sort_buffer_size 的数值以加速 Filesort 排序的操作。但是需要注意,系统变量 sort_buffer_size 配置的内存空间是每个线程独占的,不宜设置过大,应该综合考虑数据库连接数量和服务器内存的总大小。

# 查看系统 sort_merge_passes 状态的值

SHOW GLOBAL STATUS LIKE 'sort_merge_passes';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Sort_merge_passes | 2461  |
+-------------------+-------+

# 查看系统 sort_buffer_size 变量的值

SHOW VARIABLES LIKE 'sort_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| sort_buffer_size | 262144 |
+------------------+--------+

# 设置系统 sort_buffer_size 变量的值, 1M

set GLOBAL sort_buffer_size=1024*1024*1;

使用覆盖索引的方式改写上面的查询语句:

EXPLAIN SELECT id FROM user ORDER BY score, name;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key            | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | user  | NULL       | index | NULL          | idx_score_name | 56      | NULL | 20434945 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-------------+

从执行计划的结果来看, Extra 出现了 Using index ,已经没有 Using filesort 了。

1.1.2 案例2

使用复合索引(多列索引)中的一部分列做等值查询,一部列做排序操作。要遵循最左前缀匹配原则。如果是使用复合索引中的多个列进行排序,这些列排序的顺序要么都是升序,要么都是降序,否则无法使用索引加速排序操作。

SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part2;
SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part1 DESC, key_part2 DESC;
EXPLAIN SELECT * FROM user WHERE score = 100 ORDER BY name;
+----+-------------+-------+------------+------+----------------+----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys  | key            | key_len | ref   | rows  | filtered | Extra                 |
+----+-------------+-------+------------+------+----------------+----------------+---------+-------+-------+----------+-----------------------+
|  1 | SIMPLE      | user  | NULL       | ref  | idx_score_name | idx_score_name | 5       | const | 38268 |   100.00 | Using index condition |
+----+-------------+-------+------------+------+----------------+----------------+---------+-------+-------+----------+-----------------------+

1.1.3 案例3

使用复合索引(多列索引)中的一部分列做比较值查询,一部列做排序操作。要遵循最左前缀匹配原则。如果是使用复合索引中的多个列进行排序,这些列排序的顺序要么都是升序,要么都是降序,否则无法使用索引加速排序操作。

SELECT * FROM t1 WHERE key_part1 > constant ORDER BY key_part1 ASC;
SELECT * FROM t1 WHERE key_part1 < constant ORDER BY key_part1 DESC;
EXPLAIN SELECT * FROM user WHERE score < 10 ORDER BY score;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+--------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows   | filtered | Extra                 |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+--------+----------+-----------------------+
|  1 | SIMPLE      | user  | NULL       | range | idx_score_name | idx_score_name | 5       | NULL | 417240 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+--------+----------+-----------------------+

比较值的场景下,优化器不一定会选择采用索引的方式,主要还得看表数据量的大小以及 MySQL 估算查询要扫描的行数来决定是否选择走索引。

1.2 无法使用索引加速排序

1.2.1 案例1

使用多个不同的单列索引进行排序。

SELECT * FROM t1 ORDER BY key1, key2;
EXPLAIN SELECT id FROM user ORDER BY id, agent_id;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type  | possible_keys | key         | key_len | ref  | rows     | filtered | Extra                       |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+----------+----------+-----------------------------+
|  1 | SIMPLE      | user  | NULL       | index | NULL          | idx_agentid | 5       | NULL | 20434945 |   100.00 | Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+----------+----------+-----------------------------+

为了确保查询走索引,这里采用了覆盖索引的方式。

1.2.2 案例2

使用复合索引(多列索引)的多个列进行排序时,不遵循最左前缀匹配原则。

SELECT * FROM t1 WHERE key2=constant ORDER BY key_part1, key_part3;
EXPLAIN SELECT id FROM user ORDER BY name, score;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type  | possible_keys | key            | key_len | ref  | rows     | filtered | Extra                       |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+
|  1 | SIMPLE      | user  | NULL       | index | NULL          | idx_score_name | 56      | NULL | 20434945 |   100.00 | Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+

本文没有建立三个列的复合索引,这里采用覆盖索引的方式来实现,达到的效果是一样的。

1.2.3 案例3

使用复合索引(多列索引)的多个列进行排序时,同时存在升序和降序的混合排序。

SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;
EXPLAIN SELECT id FROM user ORDER BY score, name DESC;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type  | possible_keys | key            | key_len | ref  | rows     | filtered | Extra                       |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+
|  1 | SIMPLE      | user  | NULL       | index | NULL          | idx_score_name | 56      | NULL | 20434945 |   100.00 | Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+

为了确保查询走索引,这里采用了覆盖索引的方式。

2. LIMIT 优化

2.1 案例

查询代理商ID为2下面的所有用户信息,每页展示20条数据,起始值从200万行开始。为保证优化前和优化后返回相同的数据列表,这里增加了根据主键排序的条件:

EXPLAIN SELECT * FROM user WHERE agent_id = 2 ORDER BY id LIMIT 2000000, 20;
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+---------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key         | key_len | ref   | rows    | filtered | Extra                 |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+---------+----------+-----------------------+
|  1 | SIMPLE      | user  | NULL       | ref  | idx_agentid   | idx_agentid | 5       | const | 5221432 |   100.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+---------+----------+-----------------------+

在一张数据量为2000万行的表中,真实的查询耗时为 3 min 0.54 sec

2.2 优化

思路:采用覆盖索引的方式先取出已经通过索引加速排序好的分页数据的行记录ID,然后再通过行记录ID关联回表查询所需的所有数据。这样可以减少全表扫描的行数,提升查询效率。

EXPLAIN SELECT * FROM user INNER JOIN (SELECT id FROM user WHERE agent_id = 2 ORDER BY id LIMIT 2000000, 20) tmp USING(id);
+----+-------------+------------+------------+--------+---------------+-------------+---------+--------+---------+----------+--------------------------+
| id | select_type | table      | partitions | type   | possible_keys | key         | key_len | ref    | rows    | filtered | Extra                    |
+----+-------------+------------+------------+--------+---------------+-------------+---------+--------+---------+----------+--------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL        | NULL    | NULL   | 2000020 |   100.00 | NULL                     |
|  1 | PRIMARY     | user       | NULL       | eq_ref | PRIMARY       | PRIMARY     | 4       | tmp.id |       1 |   100.00 | NULL                     |
|  2 | DERIVED     | user       | NULL       | ref    | idx_agentid   | idx_agentid | 5       | const  | 5221432 |   100.00 | Using where; Using index |
+----+-------------+------------+------------+--------+---------------+-------------+---------+--------+---------+----------+--------------------------+

在一张数据量为2000万行的表中,真实的查询耗时为 0.42 sec 。优化后的查询时间是毫秒级别的,速度提升了约180倍。

3. IN 优化

3.1 案例

查询二级代理商下的所有用户信息,因测试数据量过大,此处只查询前20条数据。为保证优化前和优化后返回相同的数据列表,这里增加了根据主键排序的条件:

EXPLAIN SELECT * FROM user WHERE agent_id IN (SELECT id FROM agent WHERE level = 2) ORDER BY id LIMIT 20;
+----+-------------+-------+------------+------+-------------------+-------------+---------+---------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys     | key         | key_len | ref           | rows | filtered | Extra                                        |
+----+-------------+-------+------------+------+-------------------+-------------+---------+---------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | agent | NULL       | ref  | PRIMARY,idx_level | idx_level   | 5       | const         |    3 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | user  | NULL       | ref  | idx_agentid       | idx_agentid | 5       | test.agent.id | 3433 |   100.00 | NULL                                         |
+----+-------------+-------+------------+------+-------------------+-------------+---------+---------------+------+----------+----------------------------------------------+

代理商表6行数据,用户表2000万行数据,真实的查询耗时为 13 min 14.73 sec ,极慢,无法忍受。

通过 trace分析优化器是如何选择执行计划 的:

MySQL 查询优化

可以看到,MySQL优化器认为使用 JOIN 的方式能够更高效率的查找到数据行,因此内部将查询语句改写成了 SEMI JOIN (半连接)。也由此可见,在MySQL中,半连接优化的效率也未必高。

3.2 优化

思路: IN 查询的性能很糟糕,建议使用 EXISTS 来等效的改写查询以获得更好的查询性能。

EXPLAIN SELECT * FROM user WHERE EXISTS (SELECT 1 FROM agent WHERE level = 2 AND user.agent_id = agent.id) ORDER BY id LIMIT 20;
+----+--------------------+-------+------------+--------+-------------------+---------+---------+--------------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys     | key     | key_len | ref                | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+-------------------+---------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY            | user  | NULL       | index  | NULL              | PRIMARY | 4       | NULL               |   20 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | agent | NULL       | eq_ref | PRIMARY,idx_level | PRIMARY | 4       | test.user.agent_id |    1 |    60.00 | Using where |
+----+--------------------+-------+------------+--------+-------------------+---------+---------+--------------------+------+----------+-------------+

代理商表6行数据,用户表2000万行数据,真实的查询耗时为 0.02 sec ,提升效率是显而易见的。

3.3 案例

上面是涉及关联子查询的情况,如果 IN 后面是常量而非子查询的情况,效率是很客观的:

EXPLAIN SELECT * FROM user WHERE agent_id IN(2, 3) ORDER BY id LIMIT 20;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | user  | NULL       | index | idx_agentid   | PRIMARY | 4       | NULL |   41 |    48.58 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

在一张数据量为2000万行的表中,真实的查询耗时基本为 0.00 sec (没有查询缓存)。


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

伟大创意的诞生

伟大创意的诞生

史蒂文·约翰逊 (Steven Johnson) / 盛杨燕 / 浙江人民出版社 / 2014-8-1 / CNY 52.90

 大家都认得出好创意。印刷机、铅笔、抽水马桶、电池、互联网、GPS、Google——这些都是绝妙的创意。然而,它们是如何产生的?被喻为“科技界的达尔文”的作者,在《伟大创意的诞生》一书中,提供了深具启示意义以及有论证实据的答案。  作者史蒂文•约翰逊以富有感染力、包罗万象的风格游历于多重领域,从神经生物学、都市研究,到网络文化,找出了独特创新背后的7大关键模式,深入人类600年重要发明的......一起来看看 《伟大创意的诞生》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换