Predictive Early Stopping — A Meta Learning Approach

栏目: IT技术 · 发布时间: 4年前

内容简介:Author: Dhruv Nair, Data Scientist, Comet.mlPredictive Early Stopping is a state-of-the-art approach for speeding up model training and hyperparameter optimization. Our benchmarking studies have shown that Predictive Early Stopping can speed up model train

Predictive Early Stopping — A Meta Learning Approach

Predictive Early Stopping — A Meta Learning Approach

Author: Dhruv Nair, Data Scientist, Comet.ml

Introduction

Predictive Early Stopping is a state-of-the-art approach for speeding up model training and hyperparameter optimization. Our benchmarking studies have shown that Predictive Early Stopping can speed up model training by up to 30% independent of the underlying infrastructure.

We build on insights gathered from projects such as Learning Curve Extrapolation, Hyperband, and Median Stopping, in order to create a predictive model that can estimate the convergence value of a loss curve.

Comet is able to leverage model data, such as hyperparameters and loss curves, from over two million models in the public section of its platform to create a model whose predictions generalize across hyperparameters and model architectures.

In some cases we are able to provide an estimate of convergence, hundreds of epochs before it actually occurs. In addition to predicting the convergence value, our Predictive Early Stopping product provides an estimate of the probability that the current model will outperform the best model result seen in the current training sweep.

In some cases we are able to provide an estimate of convergence, hundreds of epochs before it actually occurs.

These predictions allow us to terminate the training of underperforming models, so that the search process is spent evaluating only the most promising candidates.

Benchmarking:

We tested our Predictive Early Stopping method in three different settings:

  1. A hyperparameter search that optimizes the parameters of a function that acts as a surrogate for a neural network
  2. A hyperparameter search to optimize a 6 layer CNN on CIFAR10 using the SMAC optimizer, with and without predictive early stopping.
  3. A hyperparameter search to optimize the same 6 layer CNN using Random Search with Hyperband vs Random Search with Predictive Early Stopping.

Results for the Surrogate Function:

In our first test, we set up an exponentially decaying function as our surrogate for a Neural Network. We ran this surrogate model for 20 steps and determined the optimal values for the parameters of this function using Comet’s Bayesian Optimizer and Predictive Early Stopping. During the hyperparameter search, we observed that suboptimal models were not allowed to train the full 20 steps. Figure 1 , illustrates our stopping mechanism.

Predictive Early Stopping — A Meta Learning Approach

Results for the CNN model with SMAC:

Our benchmarking test for the CNN model was set up in the following way:

We used the SMAC optimizer to estimate the following hyperparameters in a 6-Layer CNN model. The model hyperparameters and architecture were based on AlexNet.

{
    "learning_rate":{
        "type":"loguniform",
        "value":[0.0000001, 0.01]
    },
    "learning_rate_decay":{
        "type": "uniform",
        "value":[0.000001, 0.001]
    },
    "weight_decay": {
        "type": "loguniform",
        "value": [0.0000005, 0.005]
    }
}

We ran 8 trials of the optimizer, with and without Predictive Early Stopping. Each optimizer trial is given a 6 hour budget to evaluate as many configurations as possible. Each hyperparameter configuration was allowed to train for a maximum of 100 epochs, and the validation set was evaluated at the end of every epoch. The validation loss was used by our Predictive Early Stopping model to determine whether or not to terminate a hyperparameter configuration.

At the end of all the trials, we determined the mean value of the test loss across all trials as a function of the total number of epochs in the hyperparameter sweep that are required to achieve that loss value.

We can see in Figure 2 , that using Predictive Early Stopping allows SMAC to get to a comparable loss value, almost 300 epochs faster. This is a 25% decrease in the amount of time spent doing hyperparameter optimization.

Predictive Early Stopping — A Meta Learning Approach

We also divided the hyperparameter configurations into quantiles, based on the final validation loss. We then calculated the average amount of epochs that the Optimizer spent in each quantile across all trials.

In Figure 3 , we see that both SMAC and Predictive-Early-Stopping-SMAC spend roughly the same amount of time evaluating the top 25% and top 50% of configurations. However, Predictive Early Stopping spends 30 fewer epochs training models in the bottom 50% of results, and 20 fewer epochs in bottom 25%.

Predictive Early Stopping — A Meta Learning Approach

In Figure 4 , we see sample loss curves from a hyperparameter sweep with Predictive Early Stopping. Suboptimal configurations are stopped well before the total number of allowed training steps.

Predictive Early Stopping — A Meta Learning Approach

Results for the CNN model with Hyperband:

We set up the test for Hyperband in a similar manner to SMAC. We specifically used the Asynchronous Successive Halving Pruner implemented in Optuna . We can think of this as Hyperband with a single bracket. We selected 120 hyperparameter configurations at random. Each configuration was allotted a minimum resource of 10 epochs and allowed to train for a maximum of 100 epochs. This leads to a maximum training budget of 12000 epochs.

At every evaluation point, the number of configurations is reduced by a factor of N based on the worst performing validation losses. In our experiments, we evaluated Hyperband with N values of 2, 4 and 8.

For the Predictive Early Stopping, we tested each configuration with different values of an interval parameter. Configurations were evaluated every 10, 15, and 20 epochs. At every evaluation we estimated the probability that the current configuration will be better than the best configuration seen so far. If this probability is less than a threshold, in our case 90%, we terminate the current configuration. Threshold and Interval are both configurable hyperparameters for Predictive Early Stopping.

We then determine the best validation loss achieved in the hyperparameter sweep and the number of epochs remaining in the budget after the sweep.

In Figure 5a , we see that all approaches find the same best value for the validation loss, however, Predictive Early Stopping is able to evaluate the configurations using just 15% of the total budget, compared to Hyperband which uses 25%. This is a 10% improvement in speed.

Predictive Early Stopping — A Meta Learning Approach

Predictive Early Stopping — A Meta Learning Approach

Conclusion

Predictive Early Stopping has very obvious time, energy, and cost saving benefits. Wasted compute cycles aren’t good for the environment, or a researcher’s budget.

The Allen Institute for AI recently published a report on the rising computational costs of training machine learning models, and how these increasingly large energy requirements adversely affect the environment. The paper states that current advancements in state-of-the-art AI research have largely focussed on metrics such as accuracy, or error, at the cost of being environmentally unfriendly. They call this paradigm, Red AI. To counter the prevalence of Red AI research, they propose a shift towards AI research that places an emphasis on computational efficiency: Green AI.

The paper recommends tracking the efficiency of an AI algorithm based on the total number of floating point operations (FPO’s) required to generate a result. The total number of FPO’s directly correlates with the number of hyperparameter configurations that are evaluated during tuning, as well as the number of training iterations spent on each configuration.

We hope our efforts with Predictive Early Stopping contribute towards increasing the computational efficiency of the hyperparameter search process. We see this tool as a way to lower some of the monetary barriers associated with AI research, and as a step towards adopting Green AI practices.

In our next post, we will describe applying predictive early stopping in individual runs that do not belong to a larger parameter search.

*Predictive Early Stopping is available as an add-on with the purchase of Comet Teams or Comet Enterprise. Patent protection is being sought on Predictive Early Stopping via one or more pending patent applications.

Learn more and sign-up here .


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

机器学习

机器学习

(美)Tom Mitchell / 曾华军、张银奎、等 / 机械工业出版社 / 2008-3 / 35.00元

《机器学习》展示了机器学习中核心的算法和理论,并阐明了算法的运行过程。《机器学习》综合了许多的研究成果,例如统计学、人工智能、哲学、信息论、生物学、认知科学、计算复杂性和控制论等,并以此来理解问题的背景、算法和其中的隐含假定。《机器学习》可作为计算机专业 本科生、研究生教材,也可作为相关领域研究人员、教师的参考书。一起来看看 《机器学习》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

MD5 加密
MD5 加密

MD5 加密工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具