内容简介:In one of our recent releases of theWe call this new Citus feature the “adaptive executor” and we thought it would be useful to walk through what the Citus adaptive executor means for Postgres and how it works.In a distributed database like Citus, there ar
In one of our recent releases of the open source Citus extension , we overhauled the way Citus executes distributed SQL queries—with the net effect being some huge improvements in terms of performance, user experience, Postgres compatibility, and resource management. The Citus executor is now able to dynamically adapt to the type of distributed SQL query, ensuring fast response times both for quick index lookups and big analytical queries.
We call this new Citus feature the “adaptive executor” and we thought it would be useful to walk through what the Citus adaptive executor means for Postgres and how it works.
Why we needed to improve upon the original Citus query executors
In a distributed database like Citus, there are two ways of handling a Postgres query. Either:
- the query contains enough information to determine in which shard (i.e. in which Postgres table on the worker) the data is stored—and to route the query to the right node with minimal overhead, or
- the query is parallelized across all shards to handle large data volumes
Citus supports both models for handling SQL queries in Postgres, which has allowed Citus to scale bothmulti-tenant SaaS applications as well asreal-time analytics applications, including time series dashboards .
To handle the requirements of these different types of PostgreSQL workloads, Citus traditionally had multiple executor code paths. Previously:
- the Citus router executor was optimized for routing queries directly to the right node, with minimal overhead, and
- the Citus real-time executor was optimized for large parallel computations by opening a connection per shard, to query many shards using all available cores
Because the original Citus real-time executor was not designed with distributed transactions in mind, it did not have the ability to reuse connections for multiple shards on the same worker.
On the other hand, because the original Citus router executor was designed for minimal overhead, it explicitly reused connections for all shards.
Using different executors in the same transaction block used to complicate migrations from PostgreSQL to Citus. Moreover, there is a large class of queries that do span across shards, but do not benefit from multi-core parallelism yet still paid the overhead of establishing many connections per Citus worker node.
In one of our recent Citus open source releases, we delivered a single executor that can handle different workloads and solves all the shortcomings of the real-time and router executor in a single unified code path: The Citus adaptive executor.
Introducing the Citus Adaptive Executor
The Citus adaptive executor uses a dynamic pool of connections to each worker node to execute Postgres queries on the shards (SQL tasks).
What this means is: the Citus adaptive executor can execute multiple tasks over a single connection per Citus worker node, to minimize overhead or to parallelize queries across connections per worker node to use multiple cores—giving you the ability to use hundreds of cores and combine the memory of many servers.
The Citus executor parallelizes not only SELECT
queries, but also DML (e.g. UPDATE
), DDL (e.g. CREATE INDEX
) and other utility statements (e.g. VACUUM
) across multiple worker nodes and multiple cores. Moreover, these parallelized statements can be part of bigger transaction blocks or stored procedures that are executed as one distributed Postgres transaction.
To ensure queries can always see the result of preceding statements in a transaction block, the adaptive executor first checks whether SQL tasks need to be assigned to a particular connection. If there were preceding writes on the shard(s) that the task accesses, then the executor assigns the SQL task to the connection that did the write—and otherwise the executor assigns the task to the pool to be executed as soon as a connection becomes available.
By dynamically adjusting the pool size to the SQL query, the Citus adaptive executor can handle:
- queries that are routed to a single worker node,
- queries that are parallelized across all worker node over a single connection (=core) per node, as well as
- queries that are parallelized across many cores per worker.
The adaptive executor also enables users to limit the number of connections per worker that a query can open, to handle distributed PostgreSQL queries with high concurrency.
The not-so-secret sauce behind the Adaptive Executor
Part of what makes the executor adaptive is a technique we call “slow start”, which is inspired by the famous TCP Slow Start algorithm that allows TCP to adapt to the available bandwidth and avoid congestion.
Every 10 milliseconds, the number of new connections that the adaptive executor can open to a worker node will double, if there are tasks remaining for that worker node. That way, if a distributed query does a quick ~1ms index lookup on each shard, the executor will never open more than a single connection per node. On the other hand, if we’re dealing with an analytical query that takes several seconds, the executor will quickly parallelize it across all available cores by running tasks in parallel over multiple connections per node.
So the Citus adaptive executor does not need to guess how long a SQL query will take, but rather will adapt to each query’s runtime.
In the upcoming Citus 9.3 open source release, the adaptive executor will also factor in the total number of connections it makes to each worker node. When it approaches the connection limit on a worker ( max_connections
), the executor will avoid opening additional connections for parallelism, such that the total number of connections to each worker never exceeds the total number of connections that the client is making to the coordinator. That ensures you can run many parallel queries concurrently without any issues.
Finally, while this might not seem important to most users, as a developer I really like the Citus adaptive executor’s well-documented code and especially its elegant state machines . The extensive code comments and state machines help ensure we keep one of the c
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
神经网络与机器学习(原书第3版)
[加] Simon Haykin / 申富饶、徐烨、郑俊、晁静 / 机械工业出版社 / 2011-3 / 79.00元
神经网络是计算智能和机器学习的重要分支,在诸多领域都取得了很大的成功。在众多神经网络著作中,影响最为广泛的是Simon Haykin的《神经网络原理》(第3版更名为《神经网络与机器学习》)。在本书中,作者结合近年来神经网络和机器学习的最新进展,从理论和实际应用出发,全面、系统地介绍了神经网络的基本模型、方法和技术,并将神经网络和机器学习有机地结合在一起。 本书不但注重对数学分析方法和理论的探......一起来看看 《神经网络与机器学习(原书第3版)》 这本书的介绍吧!
在线进制转换器
各进制数互转换器
RGB CMYK 转换工具
RGB CMYK 互转工具