An Intro into the Lambda Architecture

栏目: IT技术 · 发布时间: 5年前

内容简介:The Lambda Architecture itself is a software design pattern, aiming to unify data processing. Its design enables it to process substantial quantities of data by applying both methods of batch and stream processing. A combination of these methods is used as

The Lambda Architecture itself is a software design pattern, aiming to unify data processing. Its design enables it to process substantial quantities of data by applying both methods of batch and stream processing. A combination of these methods is used as the patterns architecture approaches typical obstacles like latency, throughput and fault-tolerance.

It is used for high availability online applications, where, due to time delays, data validity is required. Generating precise and complete views by using batch processing and providing views of online data is done simultaneously.

Functionality

The Lambda Architecture has three main components, which are responsible for two main tasks. To interact and process newly incoming data and to react to queries on the existing data source. The incoming data sets will be handed off to the batch and the speed layer for further processing.

Batch Layer

The batch layer is responsible for taking care of the master data set. The master data set consists of an append-only, immutable set which only contains raw data. This is done by using a distributed processing system, which may handle massive amounts of data at once.

It gains its accuracy by being able to process all available data whilst generating views. By precomputing views based on the complete data set it is able to eliminate any error in the raw data. The output is typically generated by using map-reduce.

Map-reduce is a technique which takes a large data set and divides it into subsets. A specific function is then performed on each subset. These subsets are combined to form the output.

This output is usually stored in a read-only database, where updates fully delete the existing precomputed views. The batch layer allows the processing of older data sets. By analysing these it is possible to optimize the processing function used in the map-reduce action.

Speed Layer

The speed layer processes data streams in real-time. Therefore it neither guarantees its data to accurate nor to have fixed corrupt data. It attempts to minimize latency whilst granting real-time views into the most recent data. Thus its main purpose is to fill any gaps in the data caused by the batch layer’s lag in providing views based on the most recent data. The output of the speed layer may be thrown away after the calculations of the batch layers are finished.

Serving Layer

The serving layer combines the output from both batch and speed layer. As the initial entry point, it receives queries and responds to them. The complete data set is already available as it can use precomputed views or build them based on the processed data.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

ggplot2:数据分析与图形艺术

ggplot2:数据分析与图形艺术

哈德利·威克姆 (Hadley Wickham) / 统计之都 / 西安交通大学出版社 / 2013-5-1 / CNY 46.00

中译本序 每当我们看到一个新的软件,第一反应会是:为什么又要发明一个新软件?ggplot2是R世界里相对还比较年轻的一个包,在它之前,官方R已经有自己的基础图形系统(graphics包)和网格图形系统(grid包),并且Deepayan Sarkar也开发了lattice包,看起来R的世界对图形的支持已经足够强大了。那么我们不禁要问,为什么还要发明一套新的系统? 设计理念 打个比......一起来看看 《ggplot2:数据分析与图形艺术》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

SHA 加密
SHA 加密

SHA 加密工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具