7 Things to Think About When Developing Reinforcement Learners

栏目: IT技术 · 发布时间: 4年前

内容简介:Albeit we have made very good progress in reinforcement learning research, a unified framework to compare the algorithms is missing. Furthermore, reported metrics in research papers do not give enough information. Here, we discuss potential things to look

Albeit we have made very good progress in reinforcement learning research, a unified framework to compare the algorithms is missing. Furthermore, reported metrics in research papers do not give enough information. Here, we discuss potential things to look at to make the analysis more rigorous.

We all know how reinforcement learning paper mostly works. Researcher A publishes an algorithm B, algorithm B outperforms a subset of other “state-of-the-art” algorithms, on a strategically chosen subset of environments which coincidentally work well for the algorithm. In addition, the authors may or may not optimize the hyperparameters of the baselines, but in turn, report the best runs for algorithm B.

Not to get into detail about what has caused this research trend, we can do something to improve it. Proper evaluation metrics of algorithms (in addition to proper benchmarks) are essential in order to have a valid comparison. Mostly what researchers use in order to evaluate the performance of algorithms is the mean performance across runs, if you get lucky, then they would even report the median which is perhaps a bit more informative. Although sounding a bit cynical regarding RL research, I have to say that RL is bread and butter compared to the things that happen in other subfields of machine learning, such as not even having multiple runs of the algorithm (vision people, I am talking to you :) ).

7 Things to Think About When Developing Reinforcement Learners

https://uscresl.github.io/humanoid-gail/

Hence, this post is about what do we have to look at to compare one reinforcement learning algorithm to another, a great source of inspiration is [1], where the authors suggest a concrete way of calculating various metrics for RL algorithms, but we’ll look at it more from a top-level perspective since the intricacies are just technical details of the greater goal.

Most intuitively, when one develops an algorithm, you should look at how sensitive it is to various factors during the training procedure, such as the random seed and hyperparameters. Less variability means that the algorithm is more stable, robust, reliable etc. Except for general variability, we want to look at the worst case of different things, i.e. when having the metric, what is it in the lower tail of the distribution. No wonder that the authors from [1] took inspiration from finance in order to define concrete metrics since it turns out that we care about risk and variability in RL also. All in all, the different “reliability” categories can be separated as follows:

7 Things to Think About When Developing Reinforcement Learners

http://www.iri.upc.edu/files/scidoc/2168-Learning-cloth-manipulation-with-demonstrations.pdf

1. Variability during Training within Rollouts

Ideally, we would like to have continuous, monotonous improvement. Meaning that the average performance should increase with each rollout and within rollouts and that the performance shouldn’t get (significantly) worse from rollout to rollout. This is unfortunately mostly the case, that RL algorithms tend to be unstable. The source of the variability though can be the environment, so what you want to do is to account for the stochasticity in the environment also and adjust for it in the metric. Ideally, you would

want to obtain the best performance at the end of the training run, not somewhere in the middle.

2. Variability across Different Training Runs

The initial conditions of the training shouldn’t influence the algorithm’s performance significantly, this is why it is important to look at different random seeds in different training runs (vision people, I am looking at you!). Sensitivity to hyperparameters should also be accounted for.

3. Variability across Rollouts in Evaluation

We would like the algorithm to produce similar performance and behavior in evaluation. This shows how the algorithm deals with the stochasticity of the environment and different initialization conditions. One also must take into account though that the maximum achievable performance within the rollout can depend on the initial state, should also be taken into account.

4. Short-term Risk within Training Rollouts

The algorithm should exhibit some guarantees in worst-case performance. This is especially important in situations with safety considerations during training. In the short-term case, we want the algorithm’s performance not to wiggle too much, locally. Looking at the risk effectively means looking at the expected value of the lowest tail of the (local) distribution, below a certain percentile (let's say 5%).

5. Long-term Risk within Training Rollouts

Looking at the whole rollout, we want to close the gap between the worst and best performance within the rollout. In comparison to the short-term case, here we would fit the distribution based on the whole rollout. The expected value of the performance in the worst performance that we rarely obtain within the rollout, but it is possible. Obviously, again, this can come from the instability of the algorithm, but also the characteristics of the environment.

6. Risk across Training Runs

In contrast to the 1. point where we look at the variability after discarding outliers, here we want to see what happens with low probability, that we get a really bad seed or with a really bad set of hyperparameters.

7. Risk across Rollouts at Evaluation

In contrast to the 3. point, we look at the worst-case performance across many rollouts in evaluation. Again, the source of the variability can be the algorithm but also the environment.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

深入浅出Struts 2

深入浅出Struts 2

Budi Kuniawan / 杨涛、王建桥、杨晓云 / 人民邮电出版社 / 2009-04 / 59.00元

本书是广受赞誉的Struts 2优秀教程,它全面而深入地阐述了Struts 2的各个特性,并指导开发人员如何根据遇到的问题对症下药,选择使用最合适的特性。作者处处从实战出发,在丰富的示例中直观地探讨了许多实用的技术,如数据类型转换、文件上传和下载、提高Struts 2应用的安全性、调试与性能分析、FreeMarker、Velocity、Ajax,等等。跟随作者一道深入Struts 2,聆听大量来之......一起来看看 《深入浅出Struts 2》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具