内容简介:The Lambda Architecture itself is a software design pattern, aiming to unify data processing. Its design enables it to process substantial quantities of data by applying both methods of batch and stream processing. A combination of these methods is used as
The Lambda Architecture itself is a software design pattern, aiming to unify data processing. Its design enables it to process substantial quantities of data by applying both methods of batch and stream processing. A combination of these methods is used as the patterns architecture approaches typical obstacles like latency, throughput and fault-tolerance.
It is used for high availability online applications, where, due to time delays, data validity is required. Generating precise and complete views by using batch processing and providing views of online data is done simultaneously.
Functionality
The Lambda Architecture has three main components, which are responsible for two main tasks. To interact and process newly incoming data and to react to queries on the existing data source. The incoming data sets will be handed off to the batch and the speed layer for further processing.
Batch Layer
The batch layer is responsible for taking care of the master data set. The master data set consists of an append-only, immutable set which only contains raw data. This is done by using a distributed processing system, which may handle massive amounts of data at once.
It gains its accuracy by being able to process all available data whilst generating views. By precomputing views based on the complete data set it is able to eliminate any error in the raw data. The output is typically generated by using map-reduce.
Map-reduce is a technique which takes a large data set and divides it into subsets. A specific function is then performed on each subset. These subsets are combined to form the output.
This output is usually stored in a read-only database, where updates fully delete the existing precomputed views. The batch layer allows the processing of older data sets. By analysing these it is possible to optimize the processing function used in the map-reduce action.
Speed Layer
The speed layer processes data streams in real-time. Therefore it neither guarantees its data to accurate nor to have fixed corrupt data. It attempts to minimize latency whilst granting real-time views into the most recent data. Thus its main purpose is to fill any gaps in the data caused by the batch layer’s lag in providing views based on the most recent data. The output of the speed layer may be thrown away after the calculations of the batch layers are finished.
Serving Layer
The serving layer combines the output from both batch and speed layer. As the initial entry point, it receives queries and responds to them. The complete data set is already available as it can use precomputed views or build them based on the processed data.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。