More Robust Network Partition Handling in Group Replication

栏目: IT技术 · 发布时间: 4年前

内容简介:As Group Replication (GR) matures and it is deployed in a myriad of different systems, we begin to witness specific network conditions that we must be able to cope with in order to make Group Replication more tolerant and robust to those failures.This post

As Group Replication (GR) matures and it is deployed in a myriad of different systems, we begin to witness specific network conditions that we must be able to cope with in order to make Group Replication more tolerant and robust to those failures.

This post will describe the enhancements that were made to support certain scenarios, which will guarantee that GR will continue to provide service.

Gradual Group Degradation support

The presence of an unstable network can cause nodes to constantly drop the network connections ofXCom’s fully connected mesh. Unstable networks can be seen in Cloud environments or in slowly degrading networks switched in internal networks, or even in congested networks. This causes degradation over time which can lead to a full group partition.

Let us consider:

  • A group {A,B,C} ;
  • A flaky network among group members;
  • An expel timeout configured to a large value;

When the network degrades, nodes can form partial majorities, for instance, {A,B} and {B,C} , and throw inconsistent suspicions about each other. Those suspicions are stored and become active internally in Group Communication System (GCS).

If the group loses the majority and becomes fully partitioned: {A}{B}{C} , internal suspicious will fire and issue expel commands that will be stuck in XCom waiting for a majority. If the group recovers to {A,B,C} , those messages will unblock but they are not valid in the current context. This can lead to a possible full group disband.

XCom is our Paxos implementation and GCS is a layer on top that

implements some services, like keeping track of the dynamic

membership and act on suspicions. XCom itself does not deal with

the healing of the group. GCS itself is the one that keeps some

memory of the group and therefore GCS decides if a node should be

expelled or not based on the messages arriving from the Paxos

stream.

The solution we’ve come up with is to keep track of which members it has already ordered the expulsion in its internal memory. GCS should treat those members as if they do not belong to the current view when considering a majority to expel another node. This means that the considered quorum i.e. the minimum number of votes to consider that we have a majority in the group to issue an expel request will be the currently active group members minus the members already deemed to be expelled. If we can’t reach a majority, that expel request won’t be issued and we have solved our problem.

We have to eventually clean up our memory of which member we have already ordered the expulsion. We clean up our memory of having ordered the expulsion of a member once the expulsion has taken place, i.e. is no longer part of the group.

An improved node-to-node connection management

In XCom, there were two ways to state that a server was active from any other server point-of-view:

a) We actually send data to the server S via socket primitives, or;

b) We intend to send stuff to the server S , pushing a message to its outgoing queue

The problem that we observed happened in a group {node1,node2,node3} . node1 is either network partitioned from the group, or crashed. The group expels node1 and the group is now {node2,node3} .

The network partition of node1 is reestablished and a new node1 asks to join the group. node1 is added back to the XCom low-level configuration, but times out in the full process fro GCS i.e: state exchange. GCS tries to join again but is denied because it is already in the XCom configuration.

This happens because both node2 and node3 never broke the original connection to node1 ‘s first incarnation . node1 was constantly being marked as active, solely because of the intention of sending a message to it, i.e: scenario b) of marking a node as active. The socket was having errors due to scenario a), thus marking node1 as inactive, but scenario b) was contradicting that order.

The solution was to remove scenario b) , meaning that we only consider a server to be active when we actually are able to physically write into its connection. If we are not able to send a message because the connection is broken or the socket send buffers are depleted and poll() returns with an error, the node is marked as non-active and the connection will be forcefully shut down.

Consistent Connection Status across the group

As you all know, GCS/XCom keeps a full mesh of connections among its group members. This means that in a group with 2 members {node1,node2} , we have a node1 -> node2 connection and a node2 -> node1 connection. Due to either misconfigurations or real network issues, we could end up in a situation where one of the connection sides is broken.

Let us consider that node1 -> node2 connection is broken. node1 won’t be able to send messages to node2 but node2 will be able to send messages to node1 . node2 will mark node1 as UNREACHEABLE while node1 will see both node1 and node2 as ONLINE , until its buffer is depleted and the connection is broken, which can take some time. This leads to an inconsistent state in the group’s P_S tables.

To solve this, we will rely on an existing mechanism, which sends are_you_alive messages to nodes that we think that have died. In the previous example, node2 will start sending those messages to node1 . Since node1 sees node2 as healthy, it will start a counter. If it receives a certain amount if are_you_alive in a certain time interval, it will cut off the connections and you will see in node1 logs the following message:

"Shutting down an outgoing connection. This happens because something might be wrong on a bi-directional connection to node {node_address}:{node_port}.Please check the connection status to this member"

The P_S table status in node1 regarding node2 will also change its status to UNRECHEABLE.

Conclusion

This blog post presents us with three major improvements in Group Replication in order for it to better support transient Network Failures: Gradual Group Degradation, Improved connection management, and Improved Connection Status. If your setup happens to hit the problematic scenarios, give MYSQL 8.0.21 a try and let us know your feedback.

232 total views,  166 views today


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

代码之外的功夫

代码之外的功夫

[美] Gregory T. Brown / 李志 / 人民邮电出版社 / 2018-3-1 / 49.00元

本书虽然面向程序员,却不包含代码。在作者看来,90%的程序设计工作都不需要写代码;程序员不只是编程专家,其核心竞争力是利用代码这一工具解决人类社会的常见问题。以此作为出发点,作者精心构思了8个故事,以情景代入的方式邀请读者思考代码之外的关键问题:软件开发工作如何从以技术为中心转为以人为本?透过故事主人公的视角,读者能比较自己与书中角色的差异,发现决策过程的瑕疵,提升解决问题的综合能力。 书中......一起来看看 《代码之外的功夫》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具