7 Things to Think About When Developing Reinforcement Learners

栏目: IT技术 · 发布时间: 5年前

内容简介：Albeit we have made very good progress in reinforcement learning research, a unified framework to compare the algorithms is missing. Furthermore, reported metrics in research papers do not give enough information. Here, we discuss potential things to look

Albeit we have made very good progress in reinforcement learning research, a unified framework to compare the algorithms is missing. Furthermore, reported metrics in research papers do not give enough information. Here, we discuss potential things to look at to make the analysis more rigorous.

Marin Vlastelica Pogančić

Mar 23 ·5min read

We all know how reinforcement learning paper mostly works. Researcher A publishes an algorithm B, algorithm B outperforms a subset of other “state-of-the-art” algorithms, on a strategically chosen subset of environments which coincidentally work well for the algorithm. In addition, the authors may or may not optimize the hyperparameters of the baselines, but in turn, report the best runs for algorithm B.

Not to get into detail about what has caused this research trend, we can do something to improve it. Proper evaluation metrics of algorithms (in addition to proper benchmarks) are essential in order to have a valid comparison. Mostly what researchers use in order to evaluate the performance of algorithms is the mean performance across runs, if you get lucky, then they would even report the median which is perhaps a bit more informative. Although sounding a bit cynical regarding RL research, I have to say that RL is bread and butter compared to the things that happen in other subfields of machine learning, such as not even having multiple runs of the algorithm (vision people, I am talking to you :) ).

7 Things to Think About When Developing Reinforcement Learners — https://uscresl.github.io/humanoid-gail/

Hence, this post is about what do we have to look at to compare one reinforcement learning algorithm to another, a great source of inspiration is [1], where the authors suggest a concrete way of calculating various metrics for RL algorithms, but we’ll look at it more from a top-level perspective since the intricacies are just technical details of the greater goal.

Most intuitively, when one develops an algorithm, you should look at how sensitive it is to various factors during the training procedure, such as the random seed and hyperparameters. Less variability means that the algorithm is more stable, robust, reliable etc. Except for general variability, we want to look at the worst case of different things, i.e. when having the metric, what is it in the lower tail of the distribution. No wonder that the authors from [1] took inspiration from finance in order to define concrete metrics since it turns out that we care about risk and variability in RL also. All in all, the different “reliability” categories can be separated as follows:

1. Variability during Training within Rollouts

Ideally, we would like to have continuous, monotonous improvement. Meaning that the average performance should increase with each rollout and within rollouts and that the performance shouldn’t get (significantly) worse from rollout to rollout. This is unfortunately mostly the case, that RL algorithms tend to be unstable. The source of the variability though can be the environment, so what you want to do is to account for the stochasticity in the environment also and adjust for it in the metric. Ideally, you would

want to obtain the best performance at the end of the training run, not somewhere in the middle.

2. Variability across Different Training Runs

The initial conditions of the training shouldn’t influence the algorithm’s performance significantly, this is why it is important to look at different random seeds in different training runs (vision people, I am looking at you!). Sensitivity to hyperparameters should also be accounted for.

3. Variability across Rollouts in Evaluation

We would like the algorithm to produce similar performance and behavior in evaluation. This shows how the algorithm deals with the stochasticity of the environment and different initialization conditions. One also must take into account though that the maximum achievable performance within the rollout can depend on the initial state, should also be taken into account.

4. Short-term Risk within Training Rollouts

The algorithm should exhibit some guarantees in worst-case performance. This is especially important in situations with safety considerations during training. In the short-term case, we want the algorithm’s performance not to wiggle too much, locally. Looking at the risk effectively means looking at the expected value of the lowest tail of the (local) distribution, below a certain percentile (let's say 5%).

5. Long-term Risk within Training Rollouts

Looking at the whole rollout, we want to close the gap between the worst and best performance within the rollout. In comparison to the short-term case, here we would fit the distribution based on the whole rollout. The expected value of the performance in the worst performance that we rarely obtain within the rollout, but it is possible. Obviously, again, this can come from the instability of the algorithm, but also the characteristics of the environment.

6. Risk across Training Runs

In contrast to the 1. point where we look at the variability after discarding outliers, here we want to see what happens with low probability, that we get a really bad seed or with a really bad set of hyperparameters.

7. Risk across Rollouts at Evaluation

In contrast to the 3. point, we look at the worst-case performance across many rollouts in evaluation. Again, the source of the variability can be the algorithm but also the environment.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

7 Things to Think About When Developing Reinforcement Learners

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Web Data Mining

Bing Liu / Springer / 2011-6-26 / CAD 61.50

Web mining aims to discover useful information and knowledge from Web hyperlinks, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an ......一起来看看《Web Data Mining》这本书的介绍吧!

码农工具