内容简介:In the Foundation’sOur new stack is designed around the asynchronous RPC abstractions provided byZcash was originally a fork of Bitcoin, adding fully private transactions implemented using zero-knowledge proofs. As the first ever production-scale deployme
In the Foundation’s engineering roadmap for 2020 , we overviewed our plans for Zebra , our Rust implementation of Zcash. Announced last summer at Zcon1, Zebra aims to support the core strength of Zcash – its best-in-class cryptography – by placing it on a solid foundation, providing a modern, modular implementation that can be broken into components and used in many different contexts. In that post, we briefly described the new network stack we designed and implemented for Zebra. As a fork of Bitcoin, Zcash inherited Bitcoin’s network protocol; in this post, we’ll do a deep dive on Zebra’s network stack.
Our new stack is designed around the asynchronous RPC abstractions provided by Tower , which is used in Buoyant’s linkerd , and inspired in turn from Twitter’s Finagle . (Many thanks go to Eliza Weisman for pointing us in the right direction and giving design feedback). It isolates the Bitcoin state machine for each peer connection, exposing only a clean request/response API, and then encapsulates all the peer connections behind a connection pool that can load-balance outbound requests over all available peers. The connection pool is dynamically sized in response to backpressure , automatically crawling the network to find new peers when outbound demand (requests from the node to the network) is high, and closing existing connections to shed load when inbound demand (requests from the network to the node) is high.
Bitcoin’s Legacy Network Protocol
Zcash was originally a fork of Bitcoin, adding fully private transactions implemented using zero-knowledge proofs. As the first ever production-scale deployment of zk-SNARKs , it’s understandable that its original development was focused on bringing zk-SNARKs to production, rather than redesigning the Bitcoin blockchain. But this meant that Zcash inherited its network protocol from Bitcoin, which in turn inherited it from a poorly-specified C++ codebase written in 2009 by Satoshi before their disappearance.
The Bitcoin network protocol does not specify any concept of requests or
responses. Instead, nodes send each other messages, which are processed one at
a time and might or might not cause the recipient to generate other messages.
Often, those messages can also be sent unsolicited. For instance, node A
might send a getblocks
message to node B
, and node B
might “respond” with
an inv
message advertising inventory to node A
, but B
’s inv
message is
not connected in any way to A
’s getblocks
message. Since B
can also send A
unsolicited inv
messages as part of the gossip protocol, both nodes need
to maintain complex connection state to understand each other.
In zcashd
, all messages are processed one at a time in this
900-line
function in main.cpp
, and in bitcoind
, which has been
refactored since zcashd
was forked, this is performed by this 1400-line C++
function
. Not only is the required connection state
enormous, making it very difficult to exhaustively understand and test, it’s
also shared between different peer connections.
When thinking about what we wanted our network layer to look like, we knew this was what we didn’t want. An enormous, complex state machine shared between connections is a sure sign of future trouble for maintainability, security, and performance. So what would be the appropriate foundation?
A tower
ing Interlude
Tower
is a Rust library providing “modular and reusable components for
building robust networking clients and servers”, drawing on the ideas of
Your
Server as a Function
. The core building block of Tower is the
tower::Service
trait, which defines an abstract RPC interface:
pub trait Service<Request> where <Self::Future as Future>::Output == Result<Self::Response, Self::Error>, { type Response; type Error; type Future: Future; fn poll_ready(&mut self, cx: &mut Context) -> Poll<Result<(), Self::Error>>; fn call(&mut self, req: Request) -> Self::Future; }
If you’re already familiar with async Rust, this might make sense. But if
you’re not, it’s worth unpacking this code a little bit. First, it defines a
trait
, which are Rust’s concept of “interfaces”. The Service
trait models an asynchronous function from a Request
to a Response
, and because Rust is statically typed, these type variables are
declared as part of the trait definition. A particular implementation of the Service
trait declares the concrete types for these variables as part of the
implementation.
Next, let’s look at the call
method. This takes a mutable reference to the
service, &mut self
, and a Request
, and returns a future whose eventual
output will be a Result<Response, Error>
, i.e., either a successful response
or an error. Because the call
method takes a &mut self
, a service is
allowed to mutate internal state while processing a request.
The key component of this trait, however, is the poll_ready
method, used to
propagate backpressure. Backpressure is a term used to describe mechanisms for
different parts of a software system to perform flow control by communicating
about throughput. This blog post
has a more detailed
explanation, but as a simple example, consider copying data from a fast SSD to
a slower HDD. Because data can only be written to the HDD at a slower rate,
reading data from the SSD at a faster rate will only mean larger buffers and
more resource consumption.
The Service
contract requires that callers MUST call poll_ready
to
determine readiness before calling call
. This mechanism allows correct
propagation of backpressure information through an entire tree of Service
calls.
There is another important benefit of this design. By formalizing an
asynchronous function from a Request
to a Response
as a trait, tower
allows writing middleware that transforms Service
s in a generic way. For
instance,
tower::buffer
provides a generic wrapper for adding
a bounded buffer to any Service
;
tower::timeout
provides
generic request timeouts;
tower::retry
provides generic retry
behaviour with configurable policies;
tower-load
provides
generic load-measurement instrumentation, etc. And, because these are
implemented using zero-cost abstractions, the generic middleware adds no
runtime cost beyond the cost required to implement their behaviour.
A Request/Response Protocol for Zcash
To make use of these tools and to simplify our internal architecture, we opted to design an internal request/response protocol for communicating with the Zcash network. We model this communication as a duplexed connection, with inbound requests from remote peers to our node on the one hand and outbound requests from various parts of our node to the Zcash network.
This choice simplifies our internal architecture, because each part of the
code that needs to obtain data from the network can create an appropriate
request and obtain a response future, without having to keep track of any
other state. In contrast, zcashd
processes all messages in a single
function, making it complicated to isolate state. And because our
request/response protocol is internally defined, we can evolve it to cleanly
factor responsibility between different parts of the code without
compatibility issues.
However, Zebra needs to talk to zcashd
nodes that communicate using the
legacy Bitcoin wire protocol, which is not request/response based. To handle
this, we perform per-peer protocol translation between the legacy Bitcoin
wire protocol and our internal protocol. At a high level, when we connect to
a remote node, we create a Connection
object with a per-connection event
loop, together with a Client
object (implementing the Service
trait) that
can make requests to that Connection
.
The Connection
maintains a reference to a Service
that handles inbound
requests, as well as state determining whether there is currently an outbound
request awaiting a response. If there is, it waits for incoming messages from
the remote peer, checks if they should be interpreted as a response to the
pending request, and if not, interprets them as a request from the remote
peer to our node. Otherwise, it waits for either an incoming message from the
remote peer or for a request from the corresponding Client
.
Each Bitcoin message type corresponds to a variant of an internal Message
enum, and we use a Tokio codec
to transform a generic AsyncRead + AsyncWrite
byte source into a Stream
and Sink
of Messages
. The incoming stream of peer messages is instrumented to fire a
timestamp event into a channel that allows non-blocking collection of
per-peer last-seen information.
This design ensures that each connection’s message handling state is
completely isolated from each other connection’s state. For instance, it is
invulnerable to the
ping
attack
, since each connection’s
heartbeat messages are fired from independent timers.
It also supports backpressure. The service handling inbound requests from the
network to our node handles backpressure by load shedding, causing the
connection to close to reduce inbound request load. Outbound requests to the
remote peer are sent by its Client
handle, which reports backpressure based
on whether the Connection
has a pending request (plus a small buffer).
Finally, although we have no immediate plans in this direction, on a much longer time horizon, this provides a pathway to gradually sunset the Bitcoin wire protocol in Zcash by stabilizing some part of Zebra’s internal protocol, defining a wire format (e.g., ProtoBufs) for it, and then pushing that protocol across the wire to compatible nodes.
Building a Connection Pool
One downside of the design above is that we lose some throughput
communicating with individual peers, because we make requests one at a time
and cannot do message pipelining. To compensate, we connect to many peers
simultaneously, and provide a connection pool that tracks readiness
information of all connections and load-balances outbound requests over ready
peers. This connection pool allows us to model “the network” rather than
individual peer connections. Our implementation is adapted with
Zcash-specific behaviour from a fork of
tower-balance
,
which provides a generic connection pool. As described in that crate’s
documentation, it
Distributes requests across inner services using the Power of Two Choices .
As described in the Finagle Guide :
The algorithm randomly picks two services from the set of ready endpoints and selects the least loaded of the two. By repeatedly using this strategy, we can expect a manageable upper bound on the maximum load of any server.
The maximum load variance between any two servers is bound by ln(ln(n))
where n
is the number of servers in the cluster.
Note that there are two levels of endpoint selection here: first, whether an
endpoint (in our case, a peer connection) is ready, and second, a notion of
loading. We use
tower-load
to wrap every client handle with
an load measurement instrument. Currently, we measure its peak
exponentially-weighted moving average
of response latency to
prioritize faster peers.
This design works well for basic requests independent of peer selection, but many requests required for Zcash, e.g., a request for some particular inventory item, can only be made to a subset of connected peers, e.g., the ones that have recently advertised that inventory hash, and other requests require specialized logic (e.g., transaction diffusion). This specialized routing logic can be implemented inside the connection pool, so that it continues to abstract away “the rest of the network” into one endpoint.
The connection pool reports backpressure based on whether it has ready peer connections to service a request. If it does not, in addition to signaling unreadiness to its caller, it also fires a demand signal to a peer crawler task to request additional peers, described in more detail below.
Crawling the Network
The connection pool begins with a set of initial seed peers specified in a
config file, and when remote peers handshake with our node, the resulting Client
handles are passed through a channel to the connection pool. To find
additional peers to connect to, we spawn a task that automatically crawls the
network and assembles a set of candidate peers. Crawling begins as soon as
the network comes online, and continues periodically while it is live. The
candidate set is filtered according to last-seen timestamps and heartbeat
intervals to avoid duplicate connections to existing peers without the need
for synchronized state.
Unlike zcashd
, which maintains a fixed number of outbound connections, we
attempt to connect to as many peers as possible, subject to resource limits
and backpressure. Increasing the number of peer connections is a double-edged
sword: it increases the capacity for outbound requests from our node to the
network, but it also increases the demand of inbound requests from the
network to our node. We handle backpressure from inbound requests by shedding
load, closing the corresponding peer connections and immediately reducing the
volume of inbound requests.
Outbound connections are initiated in response to either a demand signal from the connection pool, or a long-running timer. This grows the connection pool gradually over time, or more rapidly in response to spikes in outbound demand.
All of this machinery is completely encapsulated inside of the connection pool, so the entire networking API used by the rest of the node fits onone page. The relationships of the internal components are summarized in this diagram .
Summing Up & Next Steps
This post described our networking stack. Since finishing up its basic architecture at the end of November, we’ve been busy working on a bunch of other fronts, to be described in future blog posts:
-
implementing structured generation to allow integrating property-based testing throughout the entire codebase, led by Deirdre Connolly ;
-
producing a standalone implementation of RedJubjub signatures, into which we can integrate support for FROST , a round-optimized threshold Schnorr multisignature scheme designed by the Foundation’s own Chelsea Komlo ;
-
working through the Zcash specification, identifying mismatches between the spec and the implementation;
-
writing careful definitions ofcore data structures using algebraic datatypes, so that structurally invalid data cannot be represented internally.
We’re excited to share more details on these and other projects on our way to shipping Zebra in time for NU3 activation!
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Python科学计算(第2版)
张若愚 / 清华大学出版社 / 2016-4-29 / 118
本书介绍如何用 Python 开发科学计算的应用程序,除了介绍数值计算之外,还着重介绍了如何制作交互式二维、三维图像,如何设计精巧的程序界面,如何与 C 语言编写的高速计算程序结合,如何编写声音、图像处理算法等内容。本书采用 IPython notebook 编写,所有的程序均能在本书提供的运行环境中正常运行,书中所印刷的图表以及程序输出为均为自动运行的结果,保证了书中所有程序的正确性以及可读性。......一起来看看 《Python科学计算(第2版)》 这本书的介绍吧!