Materialize: A Streaming Data Warehouse

栏目: IT技术 · 发布时间: 4年前

内容简介:Databases, and data infrastructure generally, have made substantial progress over the years.We now have access to cloud-native infrastructure that allows just about anyone to set up, maintain, and query databases at substantial scale. This is a serious dep

Databases, and data infrastructure generally, have made substantial progress over the years.

We now have access to cloud-native infrastructure that allows just about anyone to set up, maintain, and query databases at substantial scale. This is a serious departure from the monolithic software of years past, where getting access to a database involved multiple people and several companies.

However, the data still doesn’t move as fast as it should.

We believe that all information across an enterprise should be up-to-date, immediately. When a storefront accepts an order from a customer, this information should be visible everywhere: from portals used by customer service agents, to back-office inventory management and logistics, from mobile apps that consumers use to track their order, to business analysts optimizing their organization. There is little gained, and a great deal lost, by slowing down the movement of data. No data user wants to wait overnight for “jobs” to complete. Often even minutes can be too long. Demand milliseconds .

This shouldn’t come at the cost of the gains made by data infrastructure over the years: analysts still want to use declarative query languages rather than directly programming applications. Interoperability is paramount: existing dashboards, visualization, and tooling use standards and protocols that cannot simply be jettisoned. Cloud-native deployment is non-negotiable. A viable solution should look and feel like much of existing infrastructure, except instantaneous.

We also cannot regress on delivering strong consistency . When there are moments between changes to your data and analysts observing the results, users should never be presented with incorrect information. All results should reflect correct answers at some point in time (which ideally moves forward as briskly as possible).

Given these requirements, how do we get there? Traditional data processing infrastructure, but faster, isn’t the answer: it’s designed to repeatedly ask about the current state of the world, rather than to react to those changes that occur, as they occur. We need fundamentally new infrastructure based on reactive models of computation, that move new information through established dataflows as quickly as possible.

Streaming without Compromises

We believe that streaming architectures are the only ones that can produce this ideal data infrastructure. Streaming is more than a different programming model, pivoting data processing from a query-based “polling” design – with staleness built in – to a reactive model that responds to changes the moment they happen. It also bypasses repeated work on unchanged data, which allows it to scale to substantially larger volumes of work.

To fully leverage streaming’s potential, we need to rebuild the data warehouse from the inside out, so that users do not have to rebuild their data infrastructure themselves. Many people hoped that event-streaming itself would be the revolution. Cobbled together with free software, streaming is indeed an exciting development, but today requires huge sacrifices in interoperability, flexibility, and ease of use. Catering to data platform experts, it leaves millions of users who would benefit from real-time analytics behind. We believe the real solution looks a lot more like the familiar data warehouse that organizations have been used to for decades, modernized for the always-up-to-date real-time world of 2020, with industry-standard SQL as the interface.

Today we’d like to introduce Materialize: the first Streaming Data Warehouse. It connects directly to your existing event-streaming infrastructure, and to the client, it walks and quacks like Postgres, so that familiar tooling can plug-and-play with it exactly as if they’re talking to an analytics-capable read-replica of an OLTP database. Materialize builds on top of years of award-winning research and open-source development. Built on top of the Timely Dataflow research project, it gives users the power of cutting-edge streaming computation with the declarative ease of PostgreSQL.

We’re excited to take the wrapping off of Materialize today.Download it to play around on your laptop, check out the source on GitHub , or sign up for regular updates to this blog!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

腾讯之道

腾讯之道

艾永亮、刘官华、梁璐 / 机械工业出版社 / 2016-7-19 / 59

放眼整个中国乃至全球,腾讯都是一家成功的互联网企业,它代表中国企业在世界互联网版图中竖起了一面高高的旗帜。腾讯为何能取得如此大的成就,它的成功方法和商业逻辑是什么?你是不是和无数中国企业和商界人士一样,都想向腾讯取取经,但是又不得其门而入? 腾讯一直以低调、务实著称,所 以腾讯及其内部员工都极少对外界分享他们的经验;加之腾讯的商业模式多元、业务繁多且交叉、体量又极其庞大,使得从外部来系统研究......一起来看看 《腾讯之道》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

URL 编码/解码
URL 编码/解码

URL 编码/解码