ProxySQL, MySQL Group Replication, and Latency

栏目: IT技术 · 发布时间: 5年前

内容简介:Similar in functionality toA high performance, high availability, protocol aware proxy for MySQL. It allows the shaping of database traffic by delaying, caching or rewriting queries on the fly. ProxySQL can also be used to create an environment where failo

ProxySQL, MySQL Group Replication, and Latency While we’ve had MySQL Group Replication support in ProxySQL since version 1.3 (native as of v1.4), development has continued in subsequent versions. I’d like to describe a scenario of how latency can affect ProxySQL in a MySQL Group Replication environment, and outline a few new features that might help mitigate those issues. Before we dive into the specifics of the discussion, however, let’s take a quick overview of ProxySQL and Group Replication for those who may not be familiar.

MySQL Group Replication

Similar in functionality to Percona XtraDB Cluster or Galera, MySQL Group Replication is the only synchronous native HA solution for MySQL * . With built-in automatic distributed recovery, conflict detection, and group membership, MySQL GR provides a completely native HA solution for MySQL environments.

ProxySQL

A high performance, high availability, protocol aware proxy for MySQL. It allows the shaping of database traffic by delaying, caching or rewriting queries on the fly. ProxySQL can also be used to create an environment where failovers will not affect your application, automatically removing (and adding back) database nodes from a cluster based on definable thresholds.

*There is technically one other native HA solution from Oracle – MySQL NDB Cluster. However, it is outside the scope of this article and not for most general use cases.  

Test Case

I recently had an interesting case with a client who was having severe issues with latency due to network/storage stalls at the hypervisor level. The environment is fairly standard, with a single MySQL 8.x GR writer node, and two MySQL 8.x GR passive nodes. In front of the database cluster sits ProxySQL, routing traffic to the active writer and handling failover duties should one of the database nodes become unavailable. The latency always occurred in short spikes, ramping up and then falling off quickly (within seconds).

The latency and I/O stalls from the network/hypervisor were throwing ProxySQL a curveball in determining if a node was actually healthy or not, and the client was seeing frequent failovers of the active writer node – often multiple times per day. To dive a bit deeper into this, let’s examine how ProxySQL determines a node’s health at a high level.

  • PING
    • mysql-monitor_ping_timeout
      • Issued on open connection.
  • SELECT
    • mysql-monitor_groupreplication_healthcheck_timeout
      • Gets the number of transactions a node is behind and identifies which node is the writer.
  • CONNECT
    • mysql-monitor_ping_timeout
      • Will try to open new connections to the host and measure timing.

In a perfect environment, these checks work as intended, and if a node is not reachable, or has fallen too far behind, ProxySQL is able to determine that and remove the node from the cluster.  This is known as a hard_offline in ProxySQL, and means the node is removed from the routing table and all traffic to that node stops. If that node is the writer node, ProxySQL will then tee up one of the passive nodes as the active writer, and the failover is complete.

Many of the ProxySQL health checks have multiple variables to control the timeout behavior. For instance, mysql-monitor_ping_timeout sets the maximum timeout for a MySQL node to be unresponsive to a ping, and mysql-monitor_ping_max_failures set up how many times a MySQL node would have to fail a ping check before ProxySQL decides to mark it hard_offline and pull the node out of the cluster.

This wasn’t the case for the Group Replication specific ping checks, however. Prior to version 2.0.7, the options were more limited for Group Replication checks. Note we did not have the same max_failures for Group Replication that we had for standalone MySQL, and we only had the timeout check:

  • mysql-monitor_groupreplication_healthcheck_timeout

Added in version 2.0.7 was a new variable, giving us the ability to retry multiple times before marking a GR node hard_offline:

  • mysql-monitor_groupreplication_healthcheck_max_timeout_count

By setting this variable it is possible to have the group replication health check fail a configurable number of times before pulling a node out of the cluster. While this is certainly more of a Band-Aid than an actual resolution, it would allow keeping a ProxySQL + GR environment up and running while work is being done to find the root cause of latency and prevent unnecessary flapping between active and passive nodes during short latency spikes and I/O stalls.

Another similar option is currently being implemented in ProxySQL 2.0.9 for the transactions_behind check. See below:

  • mysql-monitor_groupreplication_max_transactions_behind

Currently, if group replication max_transactions_behind exceeds the threshold once, the node is evicted from the cluster. The upcoming 2.0.9 release features another additional variable which will define a count for such checks so that max_transactions_behind would have to fail more than once (x number of times) before eviction.

  • mysql-monitor_groupreplication_max_transactions_behind_count

In Summary

To be clear, the above settings will not fix any latency issues present in your environment. However, since latency can often be a hardware or network issue, and in many cases can take time to track down, these options may stabilize the environment by allowing you to relax ProxySQL’s health checks while the root cause investigation for the latency is underway.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

软件的奥秘

软件的奥秘

[美] V. Anton Spraul / 解福祥 / 人们邮电出版社 / 2017-9-1 / 49

软件已经成为人们日常生活与工作中常见的辅助工具,但是对于软件的工作原理,很多人却不是非常了解。 本书对软件的工作原理进行了解析,让读者对常用软件的工作原理有一个大致的了解。内容涉及数据如何加密、密码如何使用和保护、如何创建计算机图像、如何压缩和存储视频、如何搜索数据、程序如何解决同样的问题而不会引发冲突以及如何找出最佳路径等方面。 本书适合从事软件开发工作的专业技术人员,以及对软件工作......一起来看看 《软件的奥秘》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具