MySQL Table Fragmentation: Beware of Bulk INSERT with FAILURE or ROLLBACK

栏目: IT技术 · 发布时间: 4年前

内容简介:In this blog post, I am going to explain how table fragmentation is happening with the INSERT statement.Before going into the topic, we need to know that with MySQL, there are two kinds of fragmentation:

MySQL Table Fragmentation: Beware of Bulk INSERT with FAILURE or ROLLBACK Usually, database people are familiar with table fragmentation with DELETE statements. Whenever doing a huge delete, in most cases, they are always rebuilding the table to reclaim the disk space. But, are you thinking only DELETEs can cause table fragmentation? ( Answer: NO ).

In this blog post, I am going to explain how table fragmentation is happening with the INSERT statement.

Before going into the topic, we need to know that with MySQL, there are two kinds of fragmentation:

  • Fragmentation where some of the InnoDB pages are completely free inside the table.
  • Fragmentation where some of the InnoDB pages are not completely filled (the page has some free space).

There are three major cases of table fragmentation with INSERTs :

  • INSERT with ROLLBACK
  • Failed INSERT statement
  • Fragmentation with page-splits

Test Environment

I have created my own test environment to experiment with those cases.

  • DB: percona
  • Tables : frag, ins_frag, frag_page_spl
  • Table Size: 2G

Case 1: INSERT with ROLLBACK

At first, I have created a new table “ ins_frag ”. Then I have created a transaction (with BEGIN) and started to copy the data from table “ frag ” to table “ ins_frag ” as shown below.

mysql> create table ins_frag like frag;
Query OK, 0 rows affected (0.01 sec)
 
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into ins_frag select * from frag;
Query OK, 47521280 rows affected (3 min 7.45 sec)
Records: 47521280  Duplicates: 0  Warnings: 0
 
#Linux shell
sakthi-3.2# ls -lrth
total 8261632
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 03:00 ins_frag.ibd

From the above, you can see the INSERT was executed, but still, I did not commit/rollback the INSERT. You can note that both tables have occupied 2 GB of disk space. 

Now, I am going to ROLLBACK the INSERT.

mysql> select count(*) from ins_frag;
+----------+
| count(*) |
+----------+
| 47521280 |
+----------+
1 row in set (1.87 sec)
 
mysql> rollback;
Query OK, 0 rows affected (5 min 45.21 sec)
mysql> select count(*) from ins_frag;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
 
 
#Linux shell
sakthi-3.2# ls -lrth
total 8261632
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 03:09 ins_frag.ibd

Yes, after rollback the INSERT, the table “ ins_frag ” is still occupying the same 2 GB of disk space. Let’s find the fragmented space through the MySQL client.

mysql> SELECT
-> table_schema as 'DATABASE',
-> table_name as 'TABLE',
-> CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 * 1024 ), 2), 'G') 'TOTAL',
-> CONCAT(ROUND(data_free / ( 1024 * 1024 * 1024 ), 2), 'G') 'DATAFREE'
-> FROM information_schema.TABLES
-> where table_schema='percona' and table_name='ins_frag';
+----------+----------+-------+----------+
| DATABASE | TABLE.   | TOTAL | DATAFREE |
+----------+----------+-------+----------+
| percona  | ins_frag | 0.00G | 1.96G    |
+----------+----------+-------+----------+
1 row in set (0.01 sec)

So, this clears the rolling back the INSERT will create the fragmentation.  We need to rebuild the table to reclaim the disk space.

mysql> alter table ins_frag engine=innodb;
Query OK, 0 rows affected (2.63 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
#Linux shell
 
sakthi-3.2# ls -lrth
total 4131040
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   112K Jun 17 03:11 ins_frag.ibd

Case 2: Failed INSERT Statement

To test this case, I have created two MySQL client sessions (session 1 and session 2). 

In session 1, I am running the same INSERT statement within the transaction. But this time I have interrupted and killed the INSERT query at session 2.

Session 1

#Linux shell
 
sakthi-3.2# ls -lrth
total 4131040
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   112K Jun 17 04:02 ins_frag.ibd
 
#MySQL shell
 
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into ins_frag select * from frag;   #is running

Session 2

mysql> pager grep -i insert ; show processlist;
PAGER set to 'grep -i insert'
| 33 | root            | localhost | percona | Query   |    14 | executing              | insert into ins_frag select * from frag |
4 rows in set (0.00 sec)
 
mysql> kill 33;
Query OK, 0 rows affected (0.00 sec)

The INSERT is interrupted and failed.

Again, at Session 1:

mysql> insert into ins_frag select * from frag;
ERROR 2013 (HY000): Lost connection to MySQL server during query
 
#Linux shell
 
sakthi-3.2# ls -lrth
total 4591616
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   212M Jun 17 04:21 ins_frag.ibd
 
#MySQL shell
 
mysql> select count(*) from ins_frag;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.10 sec)

The INSERT is not completed and there is no data in the table. But still, the table .ibd file has grown up to 212M.  Let’s see the fragmented space through the MySQL client.

mysql> SELECT
-> table_schema as 'DATABASE',
-> table_name as 'TABLE',
-> CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 ), 2), 'M') 'TOTAL',
-> CONCAT(ROUND(data_free / ( 1024 * 1024 ), 2), 'M') 'DATAFREE'
-> FROM information_schema.TABLES
-> where table_schema='percona' and table_name='ins_frag';
+----------+----------+---------+----------+
| DATABASE | TABLE    | TOTAL   | DATAFREE |
+----------+----------+---------+----------+
| percona  | ins_frag | 0.03M   | 210.56M  |
+----------+----------+---------+----------+
1 row in set (0.01 sec)

It shows the table has fragmented space and has to rebuild the table to reclaim the space.

mysql> alter table ins_frag engine='innodb';
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
#Linux shell
 
sakthi-3.2# ls -lrth
total 4131040
-rw-r-----  1 _mysql  _mysql   2.0G Jun 17 02:43 frag.ibd
-rw-r-----  1 _mysql  _mysql   112K Jun 17 04:32 ins_frag.ibd

Case 3: Fragmentation with Page-Splits

We know that internally, InnoDB records are stored in the InnoDB pages. Each page size is 16K by default, but you have the option to change the page size.

If the InnoDB page doesn’t have enough space to accommodate the new record or index entry, it will be split in two pages, which will be about 50% full each. This means even for insert only workload, with no rollbacks or deletes, you may end up with only 75% avg page utilization – and so a 25% loss for this kind of internal page fragmentation.

When the indexes are built by sort, they will have more congestion, and if the table has a lot of inserts that go to the random location in the index, it will cause the page-split.

Check out this excellent blog written by Marco Tusa, InnoDB Page Merging and Page Splitting, that has the complete internals about the page-split and InnoDB page structure/operations.

For an experiment, I have created a table with a sorted index (descending),

mysql> show create table frag_page_spl\G
*************************** 1. row ***************************
Table: frag_page_spl
Create Table: CREATE TABLE `frag_page_spl` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(16) DEFAULT NULL,
`messages` varchar(600) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_spl` (`messages` DESC)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.07 sec)

We can monitor the page split activity from the table INFORMATION_SCHEMA.INNODB_METRICS . For this, you need to enable the InnoDB monitor.

mysql> SET GLOBAL innodb_monitor_enable=all;
Query OK, 0 rows affected (0.09 sec)

Then I have created the script to trigger the INSERTs randomly with 6 parallel threads. After the end of the scripts:

mysql> select name,count,type,status,comment from information_schema.innodb_metrics where name like '%index_page_spl%'\G
*************************** 1. row ***************************
name: index_page_splits
count: 52186
type: counter
status: enabled
comment: Number of index page splits
1 row in set (0.05 sec)
 
mysql> SELECT
-> table_schema as 'DATABASE',
-> table_name as 'TABLE',
-> CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 ), 2), 'M') 'TOTAL',
-> CONCAT(ROUND(data_free / ( 1024 * 1024 ), 2), 'M') 'DATAFREE'
-> FROM information_schema.TABLES
-> where table_schema='percona' and table_name='frag_page_spl';
+----------+---------------+----------+----------+
| DATABASE | TABLE.        | TOTAL    | DATAFREE |
+----------+---------------+----------+----------+
| percona  | frag_page_spl | 2667.55M | 127.92M  |
+----------+---------------+----------+----------+
1 row in set (0.00 sec)

Yes, from the metrics, we can see the page-split counter has increased. The output shows that there are 52186 page-splits operations that occurred, which created 127.92 MB of fragmentation.

Once the split page is created, the only way to move back is to have the created page drop below the merge threshold. When that happens, InnoDB moves the data from the split page with a merge operation. MERGE_THRESHOLD is configurable for table and specific indexes.

The other way is to reorganize the data is to OPTIMIZE the table. This can be a very heavy and long process, but often is the only way to recover from a situation where too many pages are located in sparse extents.

Conclusion

  • The first two cases are rare. Because most of the applications are not designed to write huge data in the table. 
  • You need to be aware of these issues, whenever doing bulk INSERTs (INSERT INTO SELECT * FROM, Loading data from Mysqldump, INSERT with huge data, etc.)
  • Remember that your fragmented disk space is always re-usable.

以上所述就是小编给大家介绍的《MySQL Table Fragmentation: Beware of Bulk INSERT with FAILURE or ROLLBACK》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Redis开发与运维

Redis开发与运维

付磊、张益军 / 机械工业出版社 / 2017-3-1 / 89.00

本书全面讲解Redis基本功能及其应用,并结合线上开发与运维监控中的实际使用案例,深入分析并总结了实际开发运维中遇到的“陷阱”,以及背后的原因, 包含大规模集群开发与管理的场景、应用案例与开发技巧,为高效开发运维提供了大量实际经验和建议。本书不要求读者有任何Redis使用经验,对入门与进阶DevOps的开发者提供有价值的帮助。主要内容包括:Redis的安装配置、API、各种高效功能、客户端、持久化......一起来看看 《Redis开发与运维》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换