关于MySQL的doublewrite与Oracle中类似的机制

栏目: 数据库 · 发布时间: 5年前

内容简介:大致看了下mysql 的double write buffer概念 :So why doublewrite is needed ? It is needed to archive data safety in case of partial page writes. Innodb does not log full pages to the log files, but uses what is called “physiological” logging which means log records c
关于 <a href='https://www.codercto.com/topics/18746.html'>MySQL</a> 的doublewrite与Oracle中类似的机制

大致看了下mysql 的double write buffer概念 :

So why doublewrite is needed ? It is needed to archive data safety in case of partial page writes. Innodb does not log full pages to the log files, but uses what is called “physiological” logging which means log records contain page number for the operation as well as operation data (ie update the row) and log sequence information. Such logging structure is geat as it require less data to be written to the log, however it requires pages to be internally consistent. It does not matter which page version it is – it could be “current” version in which case Innodb will skip page upate operation or “former” in which case Innodb will perform update. If page is inconsistent recovery can’t proceed.

Now lets talk a bit about partial page writes– what are they and why are they happening. Partial page writes is when page write request submited to OS completes only partially. For example out of 16K Innodb page only first 4KB are updated and other parts remain in their former state. Most typically partial page writes happen when power failure happens. It also can happen on OS crash – there is a chance operation system will split your 16K write into several writes and failure happens just between their execution. Reasons for splitting could be file fragmentation – most file systems use 4K block sizes by default so 16K could use more than one fragment. Also if software RAID is used page may come on the stripe border requiring multiple IO requests. Same happens with Hardware RAID on power failure if it does not have battery backed up cache. If there is single write issued to the disk itself it should be in theory completed even if power goes down as there should be enough power accomulated inside the drive to complete it. I honestly do not know if this is always the case – it is hard to check as it is not the only reason for partial page writes. I just know they tend to happen and before Innodb doublewirite was implemented I had couple of data corruptions due to it.

So how does double write works ?You can think about it as about one more short term log file allocated inside Innodb tablespace – it contains space for 100 pages. When Innodb flushes pages from Innodb buffer pool it does so by multiple pages. So several pages will be written to double write buffer (sequentially), fsync() called to ensure they make it to the disk, then pages written to their real location and fsync() called the second time. Now on recovery Innodb checks doublewrite buffer contents and pages in their original location. If page is inconsistent in double write buffer it is simply discarded, if it is inconsistent in the tablespace it is recovered from double write buffer.

我是这样认为的 其实oracle中也有写 partial page(block)的可能,只不过 o的redo和controlfile 检查点比较严谨, 不在需要这样一个区域了。

对于 partial page(block) 的前滚,o也有这样的情况,例如 一个已经写好的块, 增量检查点 未必更新了控制文件, o认为 这个块不是最新的, 那么o也会同样的再写一次 ,即便这个块已经是最新的。 这种机制在oracle中称为resilver write 镀银写。

Most data blocks are changed via redo and written by DBWR. These writes are coordinated by cache locks that insure there is only one current dirty buffer for any given block. If DBWR dies for any reason, its instance will also die. It will be necessary to do some form of recovery applying redo to reconstruct the blocks that were in the cache at the time of the failure. This recovery will have to either read or write any block that is both modified by redo, and might have been in the middle of a write when DBWR died. Thus, these blocks are a superset of the blocks that Oracle must resilver if it is responsible for resilvering. Hence, during recovery, Oracle will rewrite every block it examines in files it is responsible for resilvering. Since media recovery may be used to recover changes lost when an instance dies, Oracle must also resilver when doing media recovery.

Data blocks used for sorting are modified without generating any redo. They are still written by DBWR. These blocks are never read by any process other than the one doing the sort, and if its DBWR dies then it too will die. Thus it is not important that these blocks become resilvered.

mysql里叫double write, o 里叫 resilver write。 目的应该是一样的,机制不同,mysql里知名度很高 可能是因为其可以关闭。 o里面就没人研究能不能关这个了。


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

从入门到精通:Prezi完全解读

从入门到精通:Prezi完全解读

计育韬、朱睿楷、谢礼浩 / 电子工业出版社 / 2015-9 / 79.00元

Prezi是一款非线性逻辑演示软件,它区别于PowerPoint的线性思维逻辑;而是将整个演示内容铺呈于一张画布上,然后通过视角的转换定位到需要演示的位置,并且它的画布可以随时zoom in和zoom out,给演示者提供了一个更好的展示空间。 Prezi对于职场人士和在校学生是一个很好的发挥创意的工具,因为它的演示逻辑是非线性的,所以用它做出来的演示文稿可以如思维导图一样具有发散性,也可以......一起来看看 《从入门到精通:Prezi完全解读》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试