MySQL -- 索引选择

栏目: 数据库 · 发布时间: 5年前

内容简介:选择索引代码不优雅
  1. 优化器的重要职责: 选择索引
    • 目的是寻找 最优 的执行方案
    • 大多数时候,优化器都能找到正确的索引
  2. 在数据库里面,决定 执行代价 的因素
    • 扫描行数 – 本文关注点
    • 是否使用 临时表
    • 是否 排序
  3. MySQL在真正开始执行语句之前,并不能精确地知道满足条件的记录有多少
    • 只能根据 统计信息索引的区分度 )来 估算 记录数
    • 基数越大(不同的值越多),索引的区分度越好
  4. 统计信息中索引的基数是 不准确
mysql> SHOW INDEX FROM t;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| t     |          0 | PRIMARY  |            1 | id          | A         |      100256 |     NULL |   NULL |      | BTREE      |         |               | YES     |
| t     |          1 | a        |            1 | a           | A         |      100512 |     NULL |   NULL | YES  | BTREE      |         |               | YES     |
| t     |          1 | b        |            1 | b           | A         |      100512 |     NULL |   NULL | YES  | BTREE      |         |               | YES     |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+

基数统计

  1. 方法: 采样统计
  2. 基数:InnoDB默认选择 N 个数据页,统计这些页上的不同值,得到一个 平均值 ,然后再乘以 索引的页面数
  3. 当数据表 变更的数据行 超过 1/M 时,会 自动触发 索引的采样统计
  4. 索引统计信息的存储,参数控制 innodb_stats_persistent
    • ON:持久化存储统计信息,N=20,M=10
    • OFF:统计信息只会存储在内存中,N=8,M=16
  5. 手动触发索引的采样统计: ANALYZE TABLE t;
    • 使用场景:当explain预估的rows与实际情况差距较大时
mysql> SHOW VARIABLES LIKE '%innodb_stats_persistent%';
+--------------------------------------+-------+
| Variable_name                        | Value |
+--------------------------------------+-------+
| innodb_stats_persistent              | ON    |
| innodb_stats_persistent_sample_pages | 20    |
+--------------------------------------+-------+

表初始化

建表

CREATE TABLE `t` (
    `id` INT(11) NOT NULL,
    `a` INT(11) DEFAULT NULL,
    `b` INT(11) DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `a` (`a`),
    KEY `b` (`b`)
) ENGINE=InnoDB;

表初始化

# 存储过程
DELIMITER //
CREATE PROCEDURE idata()
BEGIN
    DECLARE i INT;
    SET i=1;
    WHILE (i <= 100000) DO
        INSERT INTO t VALUES (i, i, i);
    SET i=i+1;
    END WHILE;
END//
DELIMITER ;

# 调用存储过程
CALL idata();

索引树

MySQL -- 索引选择

查询

常规查询

选择索引 a ,预估的扫描行数为 10001

mysql> EXPLAIN SELECT * FROM t WHERE a BETWEEN 10000 AND 20000;
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+
|  1 | SIMPLE      | t     | NULL       | range | a             | a    | 5       | NULL | 10001 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+

索引选择异常

# 返回空集合
mysql> EXPLAIN SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1;
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                              |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+
|  1 | SIMPLE      | t     | NULL       | range | a,b           | b    | 5       | NULL | 50128 |     1.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+

mysql> SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1;
Empty set (0.07 sec)

# Time: 2019-01-30T11:32:31.335272Z
# User@Host: root[root] @ localhost []  Id:     8
# Query_time: 0.046896  Lock_time: 0.000141 Rows_sent: 0  Rows_examined: 50001
SET timestamp=1548847951;
SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1;
  1. 如果使用索引 a 进行查询
    • 扫描索引 a 的前1000个值,取得对应的id,再到 聚簇索引 上查出每一行,然后根据字段b来过滤,需要扫描1000行
  2. 如果使用索引 b 进行查询
    • 扫描索引 b 的最后50001个值,与上面的过程类似,需要扫描50001行
    • 优化器的异常选择,预估的扫描行数依然 不准确
    • 之前优化器选择索引 b ,是认为使用索引b能够 避免排序 ,所以即使扫描行数多,也认为代价较小
      • Extra 没有 Using filesort

force index

代码不优雅

mysql> EXPLAIN SELECT * FROM t FORCE INDEX(a) WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | t     | NULL       | range | a             | a    | 5       | NULL | 1000 |    11.11 | Using index condition; Using where; Using filesort |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+

mysql> SELECT * FROM t FORCE INDEX(a) WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1;
Empty set (0.00 sec)

# Time: 2019-01-30T11:32:45.938128Z
# User@Host: root[root] @ localhost []  Id:     8
# Query_time: 0.001304  Lock_time: 0.000148 Rows_sent: 0  Rows_examined: 1000
SET timestamp=1548847965;
SELECT * FROM t FORCE INDEX(a) WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 1;

order by b,a

不通用

mysql> EXPLAIN SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b,a LIMIT 1;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | t     | NULL       | range | a,b           | a    | 5       | NULL | 1000 |    50.00 | Using index condition; Using where; Using filesort |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+

mysql> SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b,a LIMIT 1;
Empty set (0.01 sec)

# Time: 2019-01-30T13:53:18.233163Z
# User@Host: root[root] @ localhost []  Id:     8
# Query_time: 0.000609  Lock_time: 0.000191 Rows_sent: 1  Rows_examined: 0
SET timestamp=1548856398;
EXPLAIN SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b,a LIMIT 1;
  1. order by b,a 要求按照b,a排序,那 扫描行数 成为了影响优化器 决策的主要条件 ,此时会选择只需扫描1000行的索引 a
  2. 但这并非通用优化手段,只是恰好 order by b limit 1order by b,a limit 1 都是返回b中最小的一行,语义一致而已

limit 100

不通用

mysql> EXPLAIN SELECT * FROM (SELECT * FROM t WHERE (a BETWEEN 1 AND 1000) AND (b BETWEEN 50000 AND 100000) ORDER BY b LIMIT 100) alias LIMIT 1;
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL   | NULL          | NULL | NULL    | NULL |  100 |   100.00 | NULL                                               |
|  2 | DERIVED     | t          | NULL       | range | a,b           | a    | 5       | NULL | 1000 |    50.00 | Using index condition; Using where; Using filesort |
+----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+

limit 100 :根据数据特征来 诱导 优化器,让优化器意识到使用索引 b代价很高 ,同样不具有通用性

其他办法

  1. 新建一个更合适的索引
  2. 删除误用的索引

参考资料

《MySQL实战45讲》

转载请注明出处:http://zhongmingmao.me/2019/01/30/mysql-index-select/

访问原文「MySQL -- 索引选择」获取最佳阅读体验并参与讨论


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Making Things See

Making Things See

Greg Borenstein / Make / 2012-2-3 / USD 39.99

Welcome to the Vision Revolution. With Microsoft's Kinect leading the way, you can now use 3D computer vision technology to build digital 3D models of people and objects that you can manipulate with g......一起来看看 《Making Things See》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具