Materialize: A Streaming Data Warehouse

栏目: IT技术 · 发布时间: 4年前

内容简介:Databases, and data infrastructure generally, have made substantial progress over the years.We now have access to cloud-native infrastructure that allows just about anyone to set up, maintain, and query databases at substantial scale. This is a serious dep

Databases, and data infrastructure generally, have made substantial progress over the years.

We now have access to cloud-native infrastructure that allows just about anyone to set up, maintain, and query databases at substantial scale. This is a serious departure from the monolithic software of years past, where getting access to a database involved multiple people and several companies.

However, the data still doesn’t move as fast as it should.

We believe that all information across an enterprise should be up-to-date, immediately. When a storefront accepts an order from a customer, this information should be visible everywhere: from portals used by customer service agents, to back-office inventory management and logistics, from mobile apps that consumers use to track their order, to business analysts optimizing their organization. There is little gained, and a great deal lost, by slowing down the movement of data. No data user wants to wait overnight for “jobs” to complete. Often even minutes can be too long. Demand milliseconds .

This shouldn’t come at the cost of the gains made by data infrastructure over the years: analysts still want to use declarative query languages rather than directly programming applications. Interoperability is paramount: existing dashboards, visualization, and tooling use standards and protocols that cannot simply be jettisoned. Cloud-native deployment is non-negotiable. A viable solution should look and feel like much of existing infrastructure, except instantaneous.

We also cannot regress on delivering strong consistency . When there are moments between changes to your data and analysts observing the results, users should never be presented with incorrect information. All results should reflect correct answers at some point in time (which ideally moves forward as briskly as possible).

Given these requirements, how do we get there? Traditional data processing infrastructure, but faster, isn’t the answer: it’s designed to repeatedly ask about the current state of the world, rather than to react to those changes that occur, as they occur. We need fundamentally new infrastructure based on reactive models of computation, that move new information through established dataflows as quickly as possible.

Streaming without Compromises

We believe that streaming architectures are the only ones that can produce this ideal data infrastructure. Streaming is more than a different programming model, pivoting data processing from a query-based “polling” design – with staleness built in – to a reactive model that responds to changes the moment they happen. It also bypasses repeated work on unchanged data, which allows it to scale to substantially larger volumes of work.

To fully leverage streaming’s potential, we need to rebuild the data warehouse from the inside out, so that users do not have to rebuild their data infrastructure themselves. Many people hoped that event-streaming itself would be the revolution. Cobbled together with free software, streaming is indeed an exciting development, but today requires huge sacrifices in interoperability, flexibility, and ease of use. Catering to data platform experts, it leaves millions of users who would benefit from real-time analytics behind. We believe the real solution looks a lot more like the familiar data warehouse that organizations have been used to for decades, modernized for the always-up-to-date real-time world of 2020, with industry-standard SQL as the interface.

Today we’d like to introduce Materialize: the first Streaming Data Warehouse. It connects directly to your existing event-streaming infrastructure, and to the client, it walks and quacks like Postgres, so that familiar tooling can plug-and-play with it exactly as if they’re talking to an analytics-capable read-replica of an OLTP database. Materialize builds on top of years of award-winning research and open-source development. Built on top of the Timely Dataflow research project, it gives users the power of cutting-edge streaming computation with the declarative ease of PostgreSQL.

We’re excited to take the wrapping off of Materialize today.Download it to play around on your laptop, check out the source on GitHub , or sign up for regular updates to this blog!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

马云现象的经济学分析:互联网经济的八个关键命题

马云现象的经济学分析:互联网经济的八个关键命题

胡晓鹏 / 上海社会科学院出版社 / 2016-11-1 / CNY 68.00

互联网经济的产生、发展与扩张,在冲击传统经济理论观点的同时,也彰显了自身理论体系的独特内核,并与那种立足于工业经济时代的经典理论发生显著分野。今天看来,“马云”们的成功是中国经济长期“重制造、轻服务,重产能、轻消费,重国有、轻民营”发展逻辑的结果。但互联网经济的发展却不应仅仅止步于商业技巧的翻新,还需要在理论上进行一番审慎的思考。对此,我们不禁要问:互联网经济驱动交易发生的机理是什么?用户基数和诚......一起来看看 《马云现象的经济学分析:互联网经济的八个关键命题》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

MD5 加密
MD5 加密

MD5 加密工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具