内容简介:如果对
如果对 索引字段 做 函数 操作,可能会 破坏索引值的有序性 ,因此 优化器 会决定 放弃 走 树搜索 功能
条件字段函数操作
交易日志表
CREATE TABLE `tradelog` ( `id` INT(11) NOT NULL, `tradeid` VARCHAR(32) DEFAULT NULL, `operator` INT(11) DEFAULT NULL, `t_modified` DATETIME DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `tradeid` (`tradeid`), KEY `t_modified` (`t_modified`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
-- 94608000 = 3 * 365 * 24 * 3600 -- t_modified : 2016-01-01 00:00:00 ~ 2019-01-01 00:00:00 DELIMITER ;; CREATE PROCEDURE tdata() BEGIN DECLARE i INT; SET i=0; WHILE i<1000000 DO INSERT INTO tradelog VALUES (i,i,i,FROM_UNIXTIME(UNIX_TIMESTAMP('2016-01-01 00:00:00')+FLOOR(0+(RAND()*94608000)))); SET i=i+1; END WHILE; END;; DELIMITER ; CALL tdata();
month函数
SELECT COUNT(*) FROM tradelog WHERE MONTH(t_modified)=7;
explain
mysql> EXPLAIN SELECT COUNT(*) FROM tradelog WHERE MONTH(t_modified)=7\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tradelog partitions: NULL type: index possible_keys: NULL key: t_modified key_len: 6 ref: NULL rows: 998838 filtered: 100.00 Extra: Using where; Using index
-
key=t_modified
:优化器选择了遍历二级索引t_modified
-
type=index
:表示 全索引扫描 (二级索引) -
rows=998,838≈1,000,000
:说明这条语句基本 扫描 了整个二级索引t_modified
-
Using index
:表示使用了 覆盖索引 ( 无需回表 ) - 在索引字段
t_modified
上加上MONTH
函数,导致了 全索引扫描 ,无法使用 树搜索 功能
slowlog
Rows_examined=1,000,000
,佐证了 全索引扫描
# Time: 2019-02-12T14:25:07.158350+08:00 # User@Host: root[root] @ localhost [] Id: 13 # Query_time: 0.208787 Lock_time: 0.000162 Rows_sent: 1 Rows_examined: 1000000 SET timestamp=1549952707; SELECT COUNT(*) FROM tradelog WHERE MONTH(t_modified)=7;
分析
-
WHERE t_modified='2018-07-01'
,InnoDB会按照绿色箭头的路线找到结果(树搜索)- 这源于B+树的特性: 同一层兄弟节点的有序性
-
WHERE MONTH(t_modified)=7
,在树的第一层就不知道如何操作,因此 优化器放弃了树搜索功能- 优化器可以选择遍历 聚簇索引 ,或者遍历 二级索引
t_modified
- 优化器在对比索引大小后发现,二级索引
t_modified
更小,最终选择了遍历二级索引t_modified
- 优化器可以选择遍历 聚簇索引 ,或者遍历 二级索引
优化方案
mysql> SELECT COUNT(*) FROM tradelog WHERE -> (t_modified >= '2016-7-1' AND t_modified<'2016-8-1') OR -> (t_modified >= '2017-7-1' AND t_modified<'2017-8-1') OR -> (t_modified >= '2018-7-1' AND t_modified<'2018-8-1');
explain
mysql> EXPLAIN SELECT COUNT(*) FROM tradelog WHERE -> (t_modified >= '2016-7-1' AND t_modified<'2016-8-1') OR -> (t_modified >= '2017-7-1' AND t_modified<'2017-8-1') OR -> (t_modified >= '2018-7-1' AND t_modified<'2018-8-1')\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tradelog partitions: NULL type: range possible_keys: t_modified key: t_modified key_len: 6 ref: NULL rows: 180940 filtered: 100.00 Extra: Using where; Using index
-
type=range
:表示 索引范围扫描 (二级索引) -
rows=180,940 < 998,838
,扫描行数 远小于 上面使用MONTH
函数的情况
slowlog
Rows_examined=84,704 < 1,000,000
, Query_time
也仅为使用 MONTH
函数情况的 25%
# Time: 2019-02-12T14:56:51.727672+08:00 # User@Host: root[root] @ localhost [] Id: 13 # Query_time: 0.051701 Lock_time: 0.000239 Rows_sent: 1 Rows_examined: 84704 SET timestamp=1549954611; SELECT COUNT(*) FROM tradelog WHERE (t_modified >= '2016-7-1' AND t_modified<'2016-8-1') OR (t_modified >= '2017-7-1' AND t_modified<'2017-8-1') OR (t_modified >= '2018-7-1' AND t_modified<'2018-8-1');
id+1
mysql> explain select * from tradelog where id+1 = 1000000\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tradelog partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 998838 filtered: 100.00 Extra: Using where mysql> EXPLAIN SELECT * FROM tradelog WHERE id = 999999\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tradelog partitions: NULL type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 filtered: 100.00 Extra: NULL
- 优化器会偷懒,依然认为
id+1=1,000,000
是应用在索引字段上的函数,因此采用的是 全表扫描 - 而
id=999,999
会走 聚簇索引 的 树搜索 ,const
表示这是 常量 操作(最多只会有一行记录匹配)
隐式类型转换
字符串 -> 数字
在 MySQL 中,如果字符串和数字做比较,会先 将字符串转换为数字
mysql> SELECT '10' > 9; +----------+ | '10' > 9 | +----------+ | 1 | +----------+
tradeid
explain
mysql> EXPLAIN SELECT * FROM tradelog WHERE tradeid=625912\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tradelog partitions: NULL type: ALL possible_keys: tradeid key: NULL key_len: NULL ref: NULL rows: 998838 filtered: 10.00 Extra: Using where
-
type=ALL
:表示 全表扫描 -
rows=998,838≈1,000,000
- 等价于
SELECT * FROM tradelog WHERE CAST(tradid AS SIGNED INT)=625912;
- 隐式的类型转换,导致会在索引字段上做函数操作,优化器会放弃走树搜索的功能
slowlog
Rows_examined
依然为 1,000,000
# Time: 2019-02-12T15:30:09.033772+08:00 # User@Host: root[root] @ localhost [] Id: 13 # Query_time: 0.312170 Lock_time: 0.000114 Rows_sent: 1 Rows_examined: 1000000 SET timestamp=1549956609; SELECT * FROM tradelog WHERE tradeid=625912;
id
explain
mysql> EXPLAIN SELECT * FROM tradelog WHERE id='625912'\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tradelog partitions: NULL type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 filtered: 100.00 Extra: NULL
-
type=const
:表示 常量操作 -
key=PRIMARY
:走 聚簇索引 的树搜索功能 -
rows=1
:只需要扫描一行 - 等价于
SELECT * FROM tradelog WHERE id=CAST('625912' AS SIGNED INT);
- 只是在 输入参数 上做隐式类型转换,在索引字段上并没有做函数操作,依然可以走 聚簇索引 的树搜索功能
slowlog
Rows_examined=1
,只需要扫描一行
# Time: 2019-02-12T15:45:38.222760+08:00 # User@Host: root[root] @ localhost [] Id: 13 # Query_time: 0.000476 Lock_time: 0.000210 Rows_sent: 1 Rows_examined: 1 SET timestamp=1549957538; SELECT * FROM tradelog WHERE id='625912';
隐式字符编码转换
交易详情表
-- tradelog的编码为utf8mb4,trade_detail的编码为utf8 CREATE TABLE `trade_detail` ( `id` INT(11) NOT NULL, `tradeid` VARCHAR(32) DEFAULT NULL, `trade_step` INT(11) DEFAULT NULL, `step_info` VARCHAR(32) DEFAULT NULL, PRIMARY KEY (`id`), KEY `tradeid` (`tradeid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO tradelog VALUES (1, 'aaaaaaaa', 1000, NOW()); INSERT INTO tradelog VALUES (2, 'aaaaaaab', 1000, NOW()); insert into tradelog VALUES (3, 'aaaaaaac', 1000, NOW()); INSERT INTO trade_detail VALUES (1, 'aaaaaaaa', 1, 'add'); INSERT INTO trade_detail VALUES (2, 'aaaaaaaa', 2, 'update'); INSERT INTO trade_detail VALUES (3, 'aaaaaaaa', 3, 'commit'); INSERT INTO trade_detail VALUES (4, 'aaaaaaab', 1, 'add'); INSERT INTO trade_detail VALUES (5, 'aaaaaaab', 2, 'update'); INSERT INTO trade_detail VALUES (6, 'aaaaaaab', 3, 'update again'); INSERT INTO trade_detail VALUES (7, 'aaaaaaab', 4, 'commit'); INSERT INTO trade_detail VALUES (8, 'aaaaaaac', 1, 'add'); INSERT INTO trade_detail VALUES (9, 'aaaaaaac', 2, 'update'); INSERT INTO trade_detail VALUES (10, 'aaaaaaac', 3, 'update again'); INSERT INTO trade_detail VALUES (11, 'aaaaaaac', 4, 'commit');
函数作用于二级索引
explain
mysql> EXPLAIN SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=l.tradeid AND l.id=2; +----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | l | NULL | const | PRIMARY,tradeid | PRIMARY | 4 | const | 1 | 100.00 | NULL | | 1 | SIMPLE | d | NULL | ALL | NULL | NULL | NULL | NULL | 11 | 100.00 | Using where | +----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+
-
tradelog
称为 驱动表 ,trade_detail
称为 被驱动表 ,tradeid
为 关联字段 。驱动原则: 小表驱动大表 - 优化器会先在
tradelog
表上查找id=2
的行,使用了tradelog
的聚簇索引,只扫描了一行,取出tradeid='aaaaaaab'
- 然后到
trade_detail
表上查找tradeid='aaaaaaab'
的行,但没有选择 二级索引tradeid
,而选择了 全表扫描-
type=ALL
,不符合预期,本希望走二级索引tradeid
的树搜索功能 - 原因:两个表的 字符集不相同
-
tradelog
的编码为utf8mb4
,trade_detail
的编码为utf8
, -
utf8mb4
是utf8
的超集,详见 mysql中utf8和utf8mb4区别 -
d.tradeid=l.tradeid
时,需要先 将utf8
字符串转换成utf8mb4
字符串 - 因此,被驱动表
trade_detail
里面的tradeid
字段需要先转换成utf8mb4
类型,再跟L2进行比较
-
- 等价于
SELECT * FROM trade_detail WHERE CONVERT(traideid USING utf8mb4)=$L2.tradeid.value;
- 隐式的 字符编码转换 ,导致会在二级索引
tradeid
上做函数操作,优化器会放弃走 树搜索 的功能
- 隐式的 字符编码转换 ,导致会在二级索引
-
slowlog
# Time: 2019-02-12T16:45:14.841502+08:00 # User@Host: root[root] @ localhost [] Id: 13 # Query_time: 0.000470 Lock_time: 0.000202 Rows_sent: 4 Rows_examined: 11 SET timestamp=1549961114; SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=l.tradeid AND l.id=2;
函数作用于输入参数
explain
mysql> EXPLAIN SELECT l.* FROM tradelog l, trade_detail d WHERE d.tradeid=l.tradeid AND d.id=4; +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ | 1 | SIMPLE | d | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL | | 1 | SIMPLE | l | NULL | ref | tradeid | tradeid | 131 | const | 1 | 100.00 | NULL | +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
-
trade_detail
称为 驱动表 ,tradelog
称为 被驱动表 ,tradeid
为 关联字段 - 被驱动表
tradelog
的编码为utf8mb4
,驱动表trade_detail
的编码为utf8
- 等价于
SELECT * FROM tradelog WHERE traideid = CONVERT($R4.tradeid.value USING utf8mb4);
- 函数是用在 输入参数 上的,并非二级索引
tradeid
上,因此可以用 树搜索 功能(key=tradeid
和rows=1
) -
type=ref
: Join语句中被驱动表索引引用的查询
- 等价于
slowlog
# Time: 2019-02-12T17:31:50.553151+08:00 # User@Host: root[root] @ localhost [] Id: 13 # Query_time: 0.004090 Lock_time: 0.001874 Rows_sent: 1 Rows_examined: 1 SET timestamp=1549963910; SELECT l.* FROM tradelog l, trade_detail d WHERE d.tradeid=l.tradeid AND d.id=4;
优化方案
- 常用:将
trade_detail.tradeid
的字符串编码修改为utf8mb4
-
ALTER TABLE trade_detail MODIFY tradeid VARCHAR(32) CHARACTER SET utf8mb4 DEFAULT NULL;
-
- 修改SQL(场景:数据量较大或暂不支持该DDL)
- 主动把
l.tradeid
转换为utf8
,避免了 被驱动表上的隐式字符编码转换 -
SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=CONVERT(l.tradeid USING utf8) AND l.id=2;
- 主动把
mysql> EXPLAIN SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=CONVERT(l.tradeid USING utf8) AND l.id=2; +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ | 1 | SIMPLE | l | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL | | 1 | SIMPLE | d | NULL | ref | tradeid | tradeid | 99 | const | 4 | 100.00 | NULL | +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
# Time: 2019-02-12T17:50:29.844772+08:00 # User@Host: root[root] @ localhost [] Id: 13 # Query_time: 0.000504 Lock_time: 0.000206 Rows_sent: 4 Rows_examined: 4 SET timestamp=1549965029; SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=CONVERT(l.tradeid USING utf8) AND l.id=2;
参考资料
《MySQL实战45讲》
转载请注明出处:http://zhongmingmao.me/2019/02/12/mysql-index-function/
访问原文「 MySQL -- 索引上的函数 」获取最佳阅读体验并参与讨论
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:- MySQL 8.0.13(GA) 发布,新增函数索引、强制主键等新特性
- MySQL 8.0.13(GA) 发布,新增函数索引、强制主键等新特性
- MySQL索引使用说明(单列索引和多列索引)
- Elasticsearch索引的基本操作(3)-索引的滚动索引
- Coreseek 增量索引模拟实时索引
- Coreseek 增量索引模拟实时索引
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Text Processing in Python
David Mertz / Addison-Wesley Professional / 2003-6-12 / USD 54.99
Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing som......一起来看看 《Text Processing in Python》 这本书的介绍吧!