NeurIPS 2018值得一读的强化学习论文清单

栏目: 数据库 · 发布时间: 5年前

内容简介：这个列表中的论文主要是关于

这个列表中的论文主要是关于深度强化学习和RL / AI，希望它对大家有所帮助。有关NeurIPS 2018中强化学习论文的清单如下，按第一作者姓氏的字母顺序排列。

Brandon Amos, Ivan Jimenez, Jacob Sacks, Byron Boots, and J. Zico Kolter.

Differentiable MPC for end-to-end planning and control.
Yusuf Aytar, Tobias Pfaff, David Budden, Thomas Paine, Ziyu Wang, and Nando de Freitas.

Playing hard exploration games by watching YouTube.
Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, and Honglak Lee.

Sample-efficient reinforcement learning with stochastic ensemble value expansion.
Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine.

Data-efficient model-based reinforcement learning with deep probabilistic dynamics models.
Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter.

End-to-end differentiable physics for learning and control.
Amir massoud Farahmand.

Iterative value-aware model learning.
Justin Fu, Sergey Levine, Dibya Ghosh, Larry Yang, and Avi Singh.

An event-based framework for task specification and control.
Vikash Goel, Jameson Weng, and Pascal Poupart.

Unsupervised video object segmentation for deep reinforcement learning.
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine.

Meta-reinforcement learning of structured exploration strategies.
David Ha and Jürgen Schmidhuber.

Recurrent world models facilitate policy evolution.
Nick Haber, Damian Mrowca, Stephanie Wang, Li Fei-Fei, and Daniel Yamins. Learning to play with intrinsically-motivated, self-aware agents.
Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, Jonathan Ho, and Pieter Abbeel.

Evolved policy gradients.
Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, LIANHUI Qin, Xiaodan Liang, Haoye Dong, and Eric Xing.

Deep generative models with learnable knowledge constraints.
Jiexi Huang, Fa Wu,Doina Precup, and Yang Cai.

Learning safe policies with expert guidance.
Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Xiaojin Zhu.

Adversarial attacks on stochastic bandits.
Raksha Kumaraswamy, Matthew Schlegel, Adam White, and Martha White. Context-dependent upper-confidence bounds for directed exploration.
Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, and Finale Doshi-Velez.

Human-in-the-loop interpretability prior.
Marc Lanctot, Sriram Srinivasan, Vinicius Zambaldi, Julien Perolat, karl Tuyls, Remi Munos, and Michael Bowling.

Actor-critic policy optimization in partially observable multiagent environments.
Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, and Greg Imwalle.

Data center cooling using model-predictive control.
Jan Leike, Borja Ibarz, Dario Amodei, Geoffrey Irving, andShane Legg.

Reward learning from human preferences and demonstrations in Atari.
Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song.

Learning temporal point processes via reinforcement learning.
Yuan Li, Xiaodan Liang, Zhiting Hu, and Eric Xing.

Hybrid retrieval-generation reinforced agent for medical image report generation.
Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V Le, and Ni Lao. Memory augmented policy optimization for program synthesis with generalization.
Qiang Liu, Lihong Li, Ziyang Tang, and Denny Zhou.

Breaking the curse of horizon: Infinite-horizon off-policy estimation.
Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A Faisal, Finale Doshi-Velez, and Emma Brunskill.

Representation balancing MDPs for off-policy policy evaluation.
Tyler Lu, Craig Boutilier, and Dale Schuurmans.

Non-delusionalQ-learningand value-iteration.
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet.

Are GANs created equal? a large-scale study.
David Alvarez Melis and Tommi Jaakkola.

Towards robust interpretability with self-explaining neural networks.
Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt.

DeepProbLog: Neural probabilistic logic programming.
Horia Mania, Aurelia Guy, and Benjamin Recht.

Simple random search of static linear policies is competitive for reinforcement learning.
Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Josh Tenenbaum, and Daniel Yamins.

A flexible neural representation for physics prediction.
Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine.

Data-efficient hierarchical reinforcement learning.
Ashvin Nair, Vitchyr Pong, Shikhar Bahl, Sergey Levine, Steven Lin, and Murtaza Dalal.

Visual goal-conditioned reinforcement learning by representation learning.
Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, and John C Duchi.

Scalable end-to-end autonomous vehicle testing via rare-event simulation.
Ian Osband, John S Aslanides, and Albin Cassirer.

Randomized prior functions for deep reinforcement learning.
Matthew Riemer, Miao Liu, and Gerald Tesauro.

Learning abstract options.
Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Tim Lillicrap.

Relational recurrent neural networks.
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry.

How does batch normalization help optimization? (no, it is not about internal covariate shift).
Ozan Sener and Vladlen Koltun.

Multi-task learning as multi-objective optimization.
Jiaming Song, Hongyu Ren, Dorsa Sadigh, and Stefano Ermon.

Multi-agent generative adversarial imitation learning.
Wen Sun, Geoffrey Gordon, Byron Boots, and J. Bagnell.

Dual policy iteration.
Aviv Tamar, Pieter Abbeel, Ge Yang, Thanard Kurutach, and Stuart Russell. Learning plannable representations with causal InfoGAN.
Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom. Neural arithmetic logic units.
Tongzhou Wang, YI WU, David Moore, and Stuart Russell.

Meta-learning MCMC proposals.
Catherine Wong, Neil Houlsby, Yifeng Lu, and Andrea Gesmundo.

Transfer learning with neural AutoML.
Kelvin Xu, Chelsea Finn, and Sergey Levine.

Uncertainty-aware few-shot learning with probabilistic model-agnostic meta-learning.
Zhongwen Xu, Hado van Hasselt, and David Silver.

Meta-gradient reinforcement learning.
Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum.

Neural-Symbolic VQA: Disentangling reasoning from vision and language understanding.
Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William Byrd, Matthew Might, Raquel Urtasun, and Richard Zemel.

Neural guided con- straint logic programming for program synthesis.
Yu Zhang, Ying Wei, and Qiang Yang.

Learning to multitask.
Zeyu Zheng, Junhyuk Oh, and Satinder Singh.

On learning intrinsic rewards for policy gradient methods.

信息来源： https://medium.com/@yuxili/nips-2018-rl-papers-to-read-5bc1edb85a28

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Implementing Responsive Design

Tim Kadlec / New Riders / 2012-7-31 / GBP 27.99

New devices and platforms emerge daily. Browsers iterate at a remarkable pace. Faced with this volatile landscape we can either struggle for control or we can embrace the inherent flexibility of the w......一起来看看《Implementing Responsive Design》这本书的介绍吧!

码农工具

JSON 在线解析

在线 JSON 格式化工具

html转js在线工具

RGB CMYK 转换工具

RGB CMYK 互转工具

NeurIPS 2018值得一读的强化学习论文清单

Differentiable MPC for end-to-end planning and control.

Playing hard exploration games by watching YouTube.

Sample-efficient reinforcement learning with stochastic ensemble value expansion.

Data-efficient model-based reinforcement learning with deep probabilistic dynamics models.

End-to-end differentiable physics for learning and control.

Iterative value-aware model learning.

An event-based framework for task specification and control.

Unsupervised video object segmentation for deep reinforcement learning.

Meta-reinforcement learning of structured exploration strategies.

Recurrent world models facilitate policy evolution.

Evolved policy gradients.

Deep generative models with learnable knowledge constraints.

Learning safe policies with expert guidance.

Adversarial attacks on stochastic bandits.

Human-in-the-loop interpretability prior.

Actor-critic policy optimization in partially observable multiagent environments.

Data center cooling using model-predictive control.

Reward learning from human preferences and demonstrations in Atari.

Learning temporal point processes via reinforcement learning.

Hybrid retrieval-generation reinforced agent for medical image report generation.

Breaking the curse of horizon: Infinite-horizon off-policy estimation.

Representation balancing MDPs for off-policy policy evaluation.

Are GANs created equal? a large-scale study.

Towards robust interpretability with self-explaining neural networks.

DeepProbLog: Neural probabilistic logic programming.

Simple random search of static linear policies is competitive for reinforcement learning.

A flexible neural representation for physics prediction.

Data-efficient hierarchical reinforcement learning.

Visual goal-conditioned reinforcement learning by representation learning.

Scalable end-to-end autonomous vehicle testing via rare-event simulation.

Randomized prior functions for deep reinforcement learning.

Learning abstract options.

Relational recurrent neural networks.

Multi-task learning as multi-objective optimization.

Multi-agent generative adversarial imitation learning.

Dual policy iteration.

Meta-learning MCMC proposals.

Transfer learning with neural AutoML.

Uncertainty-aware few-shot learning with probabilistic model-agnostic meta-learning.

Meta-gradient reinforcement learning.

Neural-Symbolic VQA: Disentangling reasoning from vision and language understanding.

Neural guided con- straint logic programming for program synthesis.

Learning to multitask.

On learning intrinsic rewards for policy gradient methods.

Implementing Responsive Design

JSON 在线解析

html转js在线工具

RGB CMYK 转换工具