Ballista Distributed Compute: One Year Later

栏目: IT技术 · 发布时间: 4年前

内容简介:I have been talking about distributed computing in Rust for a long time now. It is more than two and a half years since myRust is for Big Data blog post where I first talked about the prototype I was working on at the time (which eventually became DataFu

I have been talking about distributed computing in Rust for a long time now. It is more than two and a half years since myRust is for Big Data blog post where I first talked about the prototype I was working on at the time (which eventually became DataFusion and is now part of Apache Arrow).

One year ago, over the July 4th weekend, I started again with a new project named “Ballista”. The idea was to build on the foundations provided by Apache Arrow and DataFusion and demonstrate the potential of a distributed compute platform implemented in Rust. I put together a neat little demo and it got a lot of interest but it was just a demo, and the project stagnated for a long time.

However, six months ago, I took a step back and started rebuilding Ballista from scratch starting with a new architecture, based on the following foundational technologies:

I am happy to announce that I have finally released a version of Ballista that truly supports distributed queries. I don’t want to oversell the capabilities of the current release. It should be viewed as a proof-of-concept still since it only supports a small number of operators and expressions, but it is sufficient to run something very close to TPCH query 1, as shown in this brief demonstration of a distributed query running against a Ballista cluster deployed to Minikube.

Ballista Distributed Compute: One Year Later

The source code for this demo can be found here .

I’m excited about this release because it means anyone can now easily deploy a Ballista cluster into Kubernetes (or run a local test cluster using docker-compose) and try running some queries against their data. The project is also at a point where it is easier to contribute to, in order to add more functionality, such as additional operators and expressions.

The performance of distributed queries is not yet optimized and that will be one of the main areas to be improved before the full 0.3.0 release is made available in August 2020.

If you would like to try Ballista out, please visit the github repository for more information.

Want to learn more about query engines? Check out my book "How Query Engines Work" .


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

屏幕上的聪明决策

屏幕上的聪明决策

Shlomo Benartzi、Jonah Lehrer / 石磊 / 北京联合出版公司 / 2017-3 / 56.90

 为什么在手机上购物的人,常常高估商品的价值?  为什么利用网络订餐,人们更容易选择热量高的食物?  为什么网站上明明提供了所有选项,人们却还是选不到最佳的方案?  屏幕正在改变我们的思考方式,让我们变得更冲动,更容易根据直觉做出反应,进而做出错误的决策。在《屏幕上的聪明决策》一书中,什洛莫·贝纳茨教授通过引人入胜的实验及案例,揭示了究竟是什么影响了我们在屏幕上的决策。 ......一起来看看 《屏幕上的聪明决策》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

随机密码生成器
随机密码生成器

多种字符组合密码

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具