ProxySQL, MySQL Group Replication, and Latency

栏目: IT技术 · 发布时间: 4年前

内容简介:Similar in functionality toA high performance, high availability, protocol aware proxy for MySQL. It allows the shaping of database traffic by delaying, caching or rewriting queries on the fly. ProxySQL can also be used to create an environment where failo

ProxySQL, MySQL Group Replication, and Latency While we’ve had MySQL Group Replication support in ProxySQL since version 1.3 (native as of v1.4), development has continued in subsequent versions. I’d like to describe a scenario of how latency can affect ProxySQL in a MySQL Group Replication environment, and outline a few new features that might help mitigate those issues. Before we dive into the specifics of the discussion, however, let’s take a quick overview of ProxySQL and Group Replication for those who may not be familiar.

MySQL Group Replication

Similar in functionality to Percona XtraDB Cluster or Galera, MySQL Group Replication is the only synchronous native HA solution for MySQL * . With built-in automatic distributed recovery, conflict detection, and group membership, MySQL GR provides a completely native HA solution for MySQL environments.

ProxySQL

A high performance, high availability, protocol aware proxy for MySQL. It allows the shaping of database traffic by delaying, caching or rewriting queries on the fly. ProxySQL can also be used to create an environment where failovers will not affect your application, automatically removing (and adding back) database nodes from a cluster based on definable thresholds.

*There is technically one other native HA solution from Oracle – MySQL NDB Cluster. However, it is outside the scope of this article and not for most general use cases.  

Test Case

I recently had an interesting case with a client who was having severe issues with latency due to network/storage stalls at the hypervisor level. The environment is fairly standard, with a single MySQL 8.x GR writer node, and two MySQL 8.x GR passive nodes. In front of the database cluster sits ProxySQL, routing traffic to the active writer and handling failover duties should one of the database nodes become unavailable. The latency always occurred in short spikes, ramping up and then falling off quickly (within seconds).

The latency and I/O stalls from the network/hypervisor were throwing ProxySQL a curveball in determining if a node was actually healthy or not, and the client was seeing frequent failovers of the active writer node – often multiple times per day. To dive a bit deeper into this, let’s examine how ProxySQL determines a node’s health at a high level.

  • PING
    • mysql-monitor_ping_timeout
      • Issued on open connection.
  • SELECT
    • mysql-monitor_groupreplication_healthcheck_timeout
      • Gets the number of transactions a node is behind and identifies which node is the writer.
  • CONNECT
    • mysql-monitor_ping_timeout
      • Will try to open new connections to the host and measure timing.

In a perfect environment, these checks work as intended, and if a node is not reachable, or has fallen too far behind, ProxySQL is able to determine that and remove the node from the cluster.  This is known as a hard_offline in ProxySQL, and means the node is removed from the routing table and all traffic to that node stops. If that node is the writer node, ProxySQL will then tee up one of the passive nodes as the active writer, and the failover is complete.

Many of the ProxySQL health checks have multiple variables to control the timeout behavior. For instance, mysql-monitor_ping_timeout sets the maximum timeout for a MySQL node to be unresponsive to a ping, and mysql-monitor_ping_max_failures set up how many times a MySQL node would have to fail a ping check before ProxySQL decides to mark it hard_offline and pull the node out of the cluster.

This wasn’t the case for the Group Replication specific ping checks, however. Prior to version 2.0.7, the options were more limited for Group Replication checks. Note we did not have the same max_failures for Group Replication that we had for standalone MySQL, and we only had the timeout check:

  • mysql-monitor_groupreplication_healthcheck_timeout

Added in version 2.0.7 was a new variable, giving us the ability to retry multiple times before marking a GR node hard_offline:

  • mysql-monitor_groupreplication_healthcheck_max_timeout_count

By setting this variable it is possible to have the group replication health check fail a configurable number of times before pulling a node out of the cluster. While this is certainly more of a Band-Aid than an actual resolution, it would allow keeping a ProxySQL + GR environment up and running while work is being done to find the root cause of latency and prevent unnecessary flapping between active and passive nodes during short latency spikes and I/O stalls.

Another similar option is currently being implemented in ProxySQL 2.0.9 for the transactions_behind check. See below:

  • mysql-monitor_groupreplication_max_transactions_behind

Currently, if group replication max_transactions_behind exceeds the threshold once, the node is evicted from the cluster. The upcoming 2.0.9 release features another additional variable which will define a count for such checks so that max_transactions_behind would have to fail more than once (x number of times) before eviction.

  • mysql-monitor_groupreplication_max_transactions_behind_count

In Summary

To be clear, the above settings will not fix any latency issues present in your environment. However, since latency can often be a hardware or network issue, and in many cases can take time to track down, these options may stabilize the environment by allowing you to relax ProxySQL’s health checks while the root cause investigation for the latency is underway.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Node.js开发指南

Node.js开发指南

郭家寶(BYVoid) / 人民邮电出版社 / 2012-7 / 45.00元

Node.js是一种方兴未艾的新技术,诞生于2009年。经过两年的快速变化,Node.js生态圈已经逐渐走向稳定。Node.js采用了以往类似语言和框架中非常罕见的技术,总结为关键词就是:非阻塞式控制流、异步I/O、单线程消息循环。不少开发者在入门时总要经历一个痛苦的思维转变过程,给学习带来巨大的障碍。 而本书的目的就是帮助读者扫清这些障碍,学会使用Node.js进行Web后端开发,同时掌握事件驱......一起来看看 《Node.js开发指南》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具