内容简介:作者:Abbass Marouni翻译:helight原文地址:
What changed in the Big data landscape from 2013 to 2019
作者:Abbass Marouni
原文地址: https://blog.marouni.fr/bidata-trends-analysis/
I’ve been a loyal follower of Data Eng Weekly newsletter (formerly Hadoop Weekly) for the past 6 years, the newsletter is a great source for everything related to Big data and data engineering in general with a wide selection of technical articles along with product announcements and industry news.
过去6年中我是Data Eng Weekly(前身是Hadoop Weekly)的忠实粉丝,它是一个与大数据和数据工程相关的所有消息的很好的来源,它包括了大量的技术文章以及产品公告和行业新闻。
For this year’s holidays side project I decided to analyze Data Eng’s archives , that go back to January 2013, to try to analyze Big data trends and changes over the past 6 years.
在今年的假期项目中,我决定分析Data Eng的以往内容,追溯到2013年1月,尝试来分析过去这6年终大数据发展趋势和变化。
So I crawled and cleaned over 290 weekly issues (well python did !), I kept articles’ snippets from the technical, news and releases sections only. Next, I ran some basic natural language processing followed by some basic filtering to produce keywords mentions and all of the plots that follow.
所以我使用 python 抓取清洗了290多期的内容,只保留了和技术、新闻和发布相关的部分内容,接下来,我对这些内容使用一些基本自然语言处理,然后进行基本过滤生成下面的关键词和内容。
Major trends over the last seven years
Let’s start with the major trends over the last seven years, here I’m plotting the monthly rolling mean of the number of mentions of specific keywords and plotting them together on the same graph. The following plots illustrate at what approximate time frames technologies become more popular (as a result of more reporting about these technologies) when compared together.
Hadoop vs. Spark
Observations : We see the steady decline of Hadoop since 2013 and the moment Spark took over Hadoop (especially MapReduce).
Hadoop vs. Kafka
Observations : The rise of Kafka as the main building block in all Big data stacks.
Hadoop vs. Kubernetes
Observations : An interesting observation is the rise of Kubernestes, even though the Data Eng Weekly is not a Devops news letters, is a witness to the overall hype around Kubernetes in all domains starting from beginning of 2017.
一个有趣的现象是的是Kubernestes的增长,尽管Data Eng不是太关注DevOps,但是却也见证了从2017开始围绕Kubernetes在各个领域的大肆宣传。
Yearly top keywords
Here I’m simply plotting the top 10 keywords by total number of mentions in a give year.
2013 : Hadoop’s golden year !
Observations : All of the original Hadoop projects are here : HDFS, YARN, MR, PIG, … With the 2 major distributions CDH & HDP and nothing else !
2014 : The rise of Spark !
Observations : Hadoop in general continued its dominance but Spark made its debut with its first version this year was the hottest topic of 2014, e also got the first glimpse of Kafka !
2015 : Here comes Kafka !
Observations : Spark takes ever the first spot from Hadoop and Kafka making it to the top 3. Most of the old regime projects (HDFS, YARN, MR, PIG, …) didn’t make to the top 10.
Spark取代了Hadoop老大的位子,Kafka进入了top3。大多数老旧项目(HDFS, YARN, MR, PIG, …)都没有进入top10.
2016 : Streaming is on fire !
Observations : 2016 was the streaming year, Kafka took the second place from Hadoop with Spark (streaming) continuing its dominance.
2017 : Stream everything !
Observations : The same lineup as 2016 with some Flink thrown in.
2018 : Back to basics !
Observations : Kubernetes makes its debut and we’re back to basics trying to figure out the how to manages (K8S), schedule (airflow) and run (Spark, Kafka, Storage, …) our streams.
Kubernetes首次亮相top10,我们回归基础,并试图找到如何管理(K8S),如何调度(airflow)和运行( Spark, Kafka, Storage, …)数据流。
2019 : …
Observations : It’s still too early to make any conclusions about 2019, but it looks like the year where K8s & co. go prod. mainstream !
Code and dataset
I’m working on cleaning up the code so that you can generate the dataset by yourself. I’ll also be posting the NLP python snippets along with Bokeh & Seaborn plot generating snippets, so stay tuned.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:- 京东刘海锋:过去十年架构领域最重要的三个变化
- 记一次ajax的JSESSIONID 变化解决、非跨域变化
- ReactNative字体大小不随系统字体大小变化而变化
- 三月新变化
- 2019 年总结:拥抱变化
- InnoDB mutex 变化历程
戴维•罗斯 / 中信出版集团 / 2016-6 / 49.00元
在不远的未来,日常物品将能够迅速理解我们的需求,改善我们的生活,并随处可见。为了实现这一预期,我们需要能够发现用户使用产品的场景,找到用户高频刚需痛点的产品设计者。 站在下一个转型发展的悬崖上,我们看到技术将更具人性。随着物联网的发展,我们习以为常的数百件日常物品:汽车、钱包、手表、雨伞甚至垃圾桶,都将回应我们的需求,了解我们,学习为我们思考。最先出现的智能硬件为什么是智能手环、无人驾驶汽车......一起来看看 《极致:互联网时代的产品设计》 这本书的介绍吧!