Introducing TayPO, a Unifying Framework for Reinforcement Learning

栏目: IT技术 · 发布时间: 5年前

内容简介:A team of researchers from Columbia University and DeepMind have proposed a Taylor Expansion Policy Optimization (TayPO) framework that combines two leading algorithmic improvement methods.Policy optimization is a major framework in model-free reinforcemen

A team of researchers from Columbia University and DeepMind have proposed a Taylor Expansion Policy Optimization (TayPO) framework that combines two leading algorithmic improvement methods.

Introducing TayPO, a Unifying Framework for Reinforcement Learning

Policy optimization is a major framework in model-free reinforcement learning (RL), providing insights that can drive significant algorithmic performance gains. Two of the most prominent such algorithmic improvements are trust-region policy search and off-policy corrections — and these idea streams are usually evaluated separately. In the paper Taylor Expansion Policy Optimization, researchers partially unify these algorithmic ideas into a single framework, showing how Taylor expansions — a method based on the Taylor series concept used to describe and approximate math functions — share high-level similarities with both trust-region policy search and off-policy corrections. The paper was presented this week at ICML2020.

In most previous research on trust-region policy search, the main idea is to constrain the size of policy updates to limit the deviations between consecutive policies and lower-bound the performance of a new policy. Off-policy corrections meanwhile require accounting for discrepancies between target policies and behaviour policies. The researchers propose that the inherent notion of a trust-region constraint is a common feature shared by Taylor expansions and trust-region policy search, and that Taylor expansions also satisfy the requirement for off-policy evaluations.

This paper illustrates how Taylor expansions construct approximations to the full IS (Importance Sampling) corrections, which are at the core of most off-policy evaluation techniques and hence intimately relate to established off-policy evaluation techniques. Prior work has focused on applying off-policy corrections directly to policy gradient estimators instead of the surrogate objectives which generate the gradients. The researchers note that although standard policy optimization objectives involve IS weights, their link with IS is not made explicit. The use of Taylor expansions resolves the implicit link between standard policy optimization objectives and IS.

The researchers evaluated the benefits of applying the Taylor expansions across a diverse set of scenarios. The experiment results indicate that second-order correction leads to marginally better performance than first-order and retrace, and is significantly better than zero-order. In general, unbiased (or slightly biased) off-policy corrections do not yet perform as well as radically biased off-policy variants. All in all, this new formulation can bring significant gains to state-of-the-art deep RL agents.

The paper Taylor Expansion Policy Optimization is on arXiv .

Author: Grace Duan | Editor : Michael Sarazen & Fangyu Cai

Introducing TayPO, a Unifying Framework for Reinforcement Learning

Synced Report |  A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle .

Click here to find more reports from us.

We know you don’t want to miss any story.  Subscribe to our popular  Synced Global AI Weekly to get weekly AI updates.

Introducing TayPO, a Unifying Framework for Reinforcement Learning

Advertisements


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

大型网站系统与Java中间件开发实践

大型网站系统与Java中间件开发实践

曾宪杰 / 电子工业出版社 / 2014-4-24 / 65.00

本书围绕大型网站和支撑大型网站架构的 Java 中间件的实践展开介绍。从分布式系统的知识切入,让读者对分布式系统有基本的了解;然后介绍大型网站随着数据量、访问量增长而发生的架构变迁;接着讲述构建 Java 中间件的相关知识;之后的几章都是根据笔者的经验来介绍支撑大型网站架构的 Java 中间件系统的设计和实践。希望读者通过本书可以了解大型网站架构变迁过程中的较为通用的问题和解法,并了解构建支撑大型......一起来看看 《大型网站系统与Java中间件开发实践》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

MD5 加密
MD5 加密

MD5 加密工具