DeepMind Surges on, Releasing Acme and Reverb RL Libraries

栏目: IT技术 · 发布时间: 5年前

内容简介:The Alphabet subsidiary continues to pump out useful software libraries for the machine learning research community.Deepmind, now wholly an Alphabeta subsidiary, is an innovating software company focusing on Artificial Intelligence. You likely know of them

DeepMind Surges on, Releasing Acme and Reverb RL Libraries

The Alphabet subsidiary continues to pump out useful software libraries for the machine learning research community.

The Gist

Deepmind, now wholly an Alphabeta subsidiary, is an innovating software company focusing on Artificial Intelligence. You likely know of them due to their accomplishments in training AlphaGo and then AlphaGoZero . The AlphaGoZero reinforcement learning agent learned largely from scratch to become the Go world champion .

Despite this and other radical successes, the company’s academic achievements have been faced with frustrations, due to a lack of reproducibility. The ability to independently reproduce academic works is the lifeblood for validation and further collaboration. Thus, it is crucial to provide, especially in a world of increasing divide between the compute-rich and everyone else.

DeepMind understands. They have gotten into a groove in recent years, consistently releasing modular software libraries to aid fellow researchers. These libraries have served numerous purposes, including the following:

ReproducibilitySimplicityModularityParallelizationEfficiency

With the release of their Acme and Reverb libraries, this trend continues nicely. In fact, the authors of the library explicitly call out the high-level goals of Acme on their website:

1. To enable the reproducibility of our methods and results — this will help clarify what makes an RL problem hard or easy, something that is seldom apparent.

2. To simplify the way we (and the community at large) design new algorithms — we want that next RL agent to be easier for everyone to write!

3. To enhance the readability of RL agents — there should be no hidden surprises when transitioning from a paper to code.

DeepMind Acme Authors

Ok, But How?

One of the ways in which they achieve these goals is through appropriate levels of abstraction. The field of reinforcement learning is like an onion, in that its best used in terms of its layers. At face value, you have an agent that learns from data. Peeling back the data part, you see that this data is either a stored data set or a live sequence of experiences. Peeling back the agent, you see that it plans and takes actions, resulting in a measured response from its environment. You can again peel back more, and delve into policies, experience, replay, etc. The illustration below shows this nicely.

A hierarchical display of the reinforcement learning problem

Another way in which Acme achieves its goals is via a scalable data storage mechanism, implemented as the companion Reverb library. To motivate this, consider the typical experience replay buffer for an agent. How big does that buffer get? Usually, it’s on the order of at least tens to hundreds of thousands of experience tuples, and that’s per-agent. When working on a simulation involving thousands to millions of agents, you get….a lot.

By de-coupling, the notions of data producers (agents) and data consumers (learners), an efficient data storage mechanism can sit independently between the two. This is exactly what Reverb accomplishes. As a company-backed library with over 70% code in C++ and a neat python interface on top, I’m really excited to dive deep in this one.

https://www.youtube.com/watch?v=3hnlDfJYWcI&feature=youtu.be

An R2D2 RL agent playing the arcade game Breakout

Conclusion

By continuously releasing fantastic open source libraries like these, DeepMind helps to lower the barrier for entry and level the playing field for research in ML and AI. Pair this with low-cost cloud computing solutions, and anyone can jump right in! Send in any cool projects you make with these libraries. I can’t wait to see them.

Jump In

Stay Up To Date

Things move quickly in academics and industry! Keep yourself updated with the general LifeWithData blog as well as the ML UTD newsletter.

If you’re not a fan of newsletters, but still want to stay in the loop, consider adding lifewithdata.org/blog and lifewithdata.org/tag/ml-utd to a Feedly aggregation setup.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

世界是平的

世界是平的

[美] 托马斯·弗里德曼 / 何帆、肖莹莹、郝正非 / 湖南科学技术出版社 / 2006-11 / 56.00元

当学者们讨论世界这20年发展的历史,并把目光聚集在2000年到2004年3月这一段时间时,他们将说些什么?9·11恐怖袭击还是伊拉克战争?或者,他们将讨论:科技的汇集与传播使得印度、中国和许多发展中国家成为世界商品和服务产品供给链上的一员,从而为世界大的发展中国家中的中产阶级带来了大量的财富,使这两个国家在全球化浪潮中占据更有利的位置?随着世界变得平坦,我们必须以更快的速度前进,才能在竞争中赢得胜......一起来看看 《世界是平的》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具