NeurIPS 2018值得一读的强化学习论文清单

栏目: 数据库 · 发布时间: 7年前

内容简介:这个列表中的论文主要是关于

这个列表中的论文主要是关于 深度强化学习 和RL / AI,希望它对大家有所帮助。有关NeurIPS 2018中强化学习论文的清单如下,按第一作者姓氏的字母顺序排列。

  1. Brandon Amos, Ivan Jimenez, Jacob Sacks, Byron Boots, and J. Zico Kolter.

    Differentiable MPC for end-to-end planning and control.

  2. Yusuf Aytar, Tobias Pfaff, David Budden, Thomas Paine, Ziyu Wang, and Nando de Freitas.

    Playing hard exploration games by watching YouTube.

  3. Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, and Honglak Lee.

    Sample-efficient reinforcement learning with stochastic ensemble value expansion.

  4. Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine.

    Data-efficient model-based reinforcement learning with deep probabilistic dynamics models.

  5. Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter.

    End-to-end differentiable physics for learning and control.

  6. Amir massoud Farahmand.

    Iterative value-aware model learning.

  7. Justin Fu, Sergey Levine, Dibya Ghosh, Larry Yang, and Avi Singh.

    An event-based framework for task specification and control.

  8. Vikash Goel, Jameson Weng, and Pascal Poupart.

    Unsupervised video object segmentation for deep reinforcement learning.

  9. Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine.

    Meta-reinforcement learning of structured exploration strategies.

  10. David Ha and Jürgen Schmidhuber.

    Recurrent world models facilitate policy evolution.

  11. Nick Haber, Damian Mrowca, Stephanie Wang, Li Fei-Fei, and Daniel Yamins. Learning to play with intrinsically-motivated, self-aware agents.

  12. Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, Jonathan Ho, and Pieter Abbeel.

    Evolved policy gradients.

  13. Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, LIANHUI Qin, Xiaodan Liang, Haoye Dong, and Eric Xing.

    Deep generative models with learnable knowledge constraints.

  14. Jiexi Huang, Fa Wu,Doina Precup, and Yang Cai.

    Learning safe policies with expert guidance.

  15. Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Xiaojin Zhu.

    Adversarial attacks on stochastic bandits.

  16. Raksha Kumaraswamy, Matthew Schlegel, Adam White, and Martha White. Context-dependent upper-confidence bounds for directed exploration.

  17. Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, and Finale Doshi-Velez.

    Human-in-the-loop interpretability prior.

  18. Marc Lanctot, Sriram Srinivasan, Vinicius Zambaldi, Julien Perolat, karl Tuyls, Remi Munos, and Michael Bowling.

    Actor-critic policy optimization in partially observable multiagent environments.

  19. Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, and Greg Imwalle.

    Data center cooling using model-predictive control.

  20. Jan Leike, Borja Ibarz, Dario Amodei, Geoffrey Irving, andShane Legg.

    Reward learning from human preferences and demonstrations in Atari.

  21. Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song.

    Learning temporal point processes via reinforcement learning.

  22. Yuan Li, Xiaodan Liang, Zhiting Hu, and Eric Xing.

    Hybrid retrieval-generation reinforced agent for medical image report generation.

  23. Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V Le, and Ni Lao. Memory augmented policy optimization for program synthesis with generalization.

  24. Qiang Liu, Lihong Li, Ziyang Tang, and Denny Zhou.

    Breaking the curse of horizon: Infinite-horizon off-policy estimation.

  25. Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A Faisal, Finale Doshi-Velez, and Emma Brunskill.

    Representation balancing MDPs for off-policy policy evaluation.

  26. Tyler Lu, Craig Boutilier, and Dale Schuurmans.

    Non-delusionalQ-learningand value-iteration.

  27. Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet.

    Are GANs created equal? a large-scale study.

  28. David Alvarez Melis and Tommi Jaakkola.

    Towards robust interpretability with self-explaining neural networks.

  29. Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt.

    DeepProbLog: Neural probabilistic logic programming.

  30. Horia Mania, Aurelia Guy, and Benjamin Recht.

    Simple random search of static linear policies is competitive for reinforcement learning.

  31. Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Josh Tenenbaum, and Daniel Yamins.

    A flexible neural representation for physics prediction.

  32. Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine.

    Data-efficient hierarchical reinforcement learning.

  33. Ashvin Nair, Vitchyr Pong, Shikhar Bahl, Sergey Levine, Steven Lin, and Murtaza Dalal.

    Visual goal-conditioned reinforcement learning by representation learning.

  34. Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, and John C Duchi.

    Scalable end-to-end autonomous vehicle testing via rare-event simulation.

  35. Ian Osband, John S Aslanides, and Albin Cassirer.

    Randomized prior functions for deep reinforcement learning.

  36. Matthew Riemer, Miao Liu, and Gerald Tesauro.

    Learning abstract options.

  37. Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Tim Lillicrap.

    Relational recurrent neural networks.

  38. Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry.

    How does batch normalization help optimization? (no, it is not about internal covariate shift).

  39. Ozan Sener and Vladlen Koltun.

    Multi-task learning as multi-objective optimization.

  40. Jiaming Song, Hongyu Ren, Dorsa Sadigh, and Stefano Ermon.

    Multi-agent generative adversarial imitation learning.

  41. Wen Sun, Geoffrey Gordon, Byron Boots, and J. Bagnell.

    Dual policy iteration.

  42. Aviv Tamar, Pieter Abbeel, Ge Yang, Thanard Kurutach, and Stuart Russell. Learning plannable representations with causal InfoGAN.

  43. Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom. Neural arithmetic logic units.

  44. Tongzhou Wang, YI WU, David Moore, and Stuart Russell.

    Meta-learning MCMC proposals.

  45. Catherine Wong, Neil Houlsby, Yifeng Lu, and Andrea Gesmundo.

    Transfer learning with neural AutoML.

  46. Kelvin Xu, Chelsea Finn, and Sergey Levine.

    Uncertainty-aware few-shot learning with probabilistic model-agnostic meta-learning.

  47. Zhongwen Xu, Hado van Hasselt, and David Silver.

    Meta-gradient reinforcement learning.

  48. Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum.

    Neural-Symbolic VQA: Disentangling reasoning from vision and language understanding.

  49. Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William Byrd, Matthew Might, Raquel Urtasun, and Richard Zemel.

    Neural guided con- straint logic programming for program synthesis.

  50. Yu Zhang, Ying Wei, and Qiang Yang.

    Learning to multitask.

  51. Zeyu Zheng, Junhyuk Oh, and Satinder Singh.

    On learning intrinsic rewards for policy gradient methods.

信息来源: https://medium.com/@yuxili/nips-2018-rl-papers-to-read-5bc1edb85a28


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

痛点

痛点

马丁·林斯特龙 / 陈亚萍 / 中信出版集团股份有限公司 / 2017-4-1 / CNY 49.00

互联网经济迅猛发展,大数据成为分析用户需求的一种惯性路径。世界首席品牌营销专家林斯特龙则指出,大数据连接了千百万的数据点,可以准确地产生相互关系。但是,当人类按照自己的习惯行动时,大数据分析通常不会十分准确。所以挖掘用户需求时,在大数据之外,更重要的是通过对一个小群体的亲身观察和小数据常识,捕捉到这个社会群体所体现出的文化欲望。满足这些用户需求,击中痛点,则意味着将掌握无限的商机。一起来看看 《痛点》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具