内容简介:To demonstrate hyperparameter tuning methods, we’ll useBoston housing price regression dataset can be downloaded directly using keras. Here’s a
Hyperparameter Tuning with Keras Tuner
Getting the most out of your models
Great data scientists do not settle with “okay”, they go beyond to achieve the extraordinary.
In this article, we’ll review techniques data scientists use to create models that works great and wins competitions. Let’s get the most out of our models by choosing the optimal hyperparameters for a learning algorithm. This task is known as hyperparameter optimization or hyperparameter tuning. This is especially strenuous in deep learning as neural networks are full of hyperparameters. I’ll assume that you are already familiar with common data science concepts like regression and mean squared error (MSE) metric and have experience building model using tensorflow and keras.
To demonstrate hyperparameter tuning methods, we’ll use keras tuner library to tune a regression model on the Boston housing price dataset. This dataset contains 13 attributes with 404 and 102 training and testing samples respectively. We’ll use tensorflow as keras backend so make sure you have tensorflow installed on your machines. I’m using tensorflow version ‘2.1.0’ and kerastuner version ‘1.0.1’. Tensorflow 2.0.x comes up with keras so you don’t need to install keras separately if you have version 2.0.x. You can check the version you have using the code below:
import tensorflow as tf import kerastuner as ktprint(tf.__version__) print(kt.__version__)
Load the dataset
Boston housing price regression dataset can be downloaded directly using keras. Here’s a list of datasets that comes with keras. To load the dataset, run the following codes.
from tensorflow.keras.datasets import boston_housing(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
Note that if this is the first time you are using this dataset within keras, it will download the dataset from an external source.
This is the regression model I’ll use in this demo. The code below shows how the model was built without any tuning.
from sklearn.preprocessing import StandardScaler from tensorflow.keras import models, layers# set random seed from numpy.random import seed seed(42) import tensorflow tensorflow.random.set_seed(42)# preprocessing - normalization scaler = StandardScaler() scaler.fit(x_train) x_train_scaled = scaler.transform(x_train) x_test_scaled = scaler.transform(x_test)# model building model = models.Sequential() model.add(layers.Dense(8, activation='relu', input_shape=(x_train.shape[1],))) model.add(layers.Dense(16, activation='relu')) model.add(layers.Dropout(0.1)) model.add(layers.Dense(1))# compile model using rmsprop model.compile(optimizer='rmsprop',loss='mse',metrics=['mse'])# model training history = model.fit(x_train_scaled, y_train, validation_split=0.2, epochs=10)# model evaluation model.evaluate(x_test_scaled, y_test)
This model have a MSE of around 434. I have set the random seed in numpy and tensorflow to 42 to get reproducible results. Despite doing so, I still get slightly different results every time I run the code. Let me know in the comments what else I missed make this reproducible.
Tuning with Keras Tuner
To start tuning the model in keras tuner, let’s define a hypermodel first. Hypermodel is a keras tuner class that let’s you define the model with a searchable space and build it.
Create a class that inherits from kerastuner.HyperModel, like so:
from kerastuner import HyperModelclass RegressionHyperModel(HyperModel): def __init__(self, input_shape): self.input_shape = input_shape def build(self, hp): model = Sequential() model.add( layers.Dense( units=hp.Int('units', 8, 64, 4, default=8), activation=hp.Choice( 'dense_activation', values=['relu', 'tanh', 'sigmoid'], default='relu'), input_shape=input_shape ) ) model.add( layers.Dense( units=hp.Int('units', 16, 64, 4, default=16), activation=hp.Choice( 'dense_activation', values=['relu', 'tanh', 'sigmoid'], default='relu') ) ) model.add( layers.Dropout( hp.Float( 'dropout', min_value=0.0, max_value=0.1, default=0.005, step=0.01) ) ) model.add(layers.Dense(1)) model.compile( optimizer='rmsprop',loss='mse',metrics=['mse'] ) return model
This is the same model we built earlier, except that for every hyperparameter, we defined a search space. You may have noticed hp.Int, hp.Float, and hp.Choice, these are used to define a search space for a hyperparameter that accepts an integer, float and a category respectively. A complete list of hyperparameter methods can be found here . ‘hp’ is an alias for Keras Tuner’s HyperParameters class.
Hyperparameter such as the number of units in a dense layer accepts an integer, hence, hp.Int is used to define a range of integers to try. Similarly, the dropout rate accepts a float value so hp.Float is used. Both hp.Int and hp.Float requires a name, minimum value and maximum value, while the step size and default value is optional.
The hp.Int search space below is named, “units”, and will have values from 8 to 64 in multiples of 4, and a default value of 8. hp.Float is used similarly as hp.Int but accepts float values.
hp.Int('units', 8, 64, 4, default=8)
hp.Choice is used to define a categorical hyperparameter such as the activation function. The search space below, named “dense_activation”, will choose between “relu”, “tanh”, and “sigmoid” functions, with a default value set to “relu”.
hp.Choice('dense_activation', values=['relu', 'tanh', 'sigmoid'], default='relu')
Instantiate HyperModel
Let’s instantiate a hypermodel object. Input shape varies per dataset and the problem you are trying to solve.
input_shape = (x_train.shape[1],) hypermodel = RegressionHyperModel(input_shape)
Let’s start tuning!
Random Search
As the name suggests, this hyperparameter tuning method, randomly tries a combination of hyperparameters from a given search space. To use this method in keras tuner, let’s define a tuner using one of the available Tuners. Here’s a full list of Tuners .
tuner_rs = RandomSearch( hypermodel, objective='mse', seed=42, max_trials=10, executions_per_trial=2)
Run the random search tuner using the search method.
tuner_rs.search(x_train_scaled, y_train, epochs=10, validation_split=0.2, verbose=0)
Select the best combination of hyperparameters the tuner had tried and evaluate.
best_model = tuner_rs.get_best_models(num_models=1)[0] loss, mse = best_model.evaluate(x_test_scaled, y_test)
Random search’s MSE is 53.48, a very big improvement from not performing any tuning at all.
Hyperband
Hyperband is based on the algorithm by Li et. al . It optimizes random search method through adaptive resource allocation and early-stopping. Hyperband first runs random hyperparameter configurations for one iteration or two, then selects which configurations perform well, then continues tuning the best performers.
tuner_hb = Hyperband( hypermodel, max_epochs=5, objective='mse', seed=42, executions_per_trial=2 )tuner_hb.search(x_train_scaled, y_train, epochs=10, validation_split=0.2, verbose=0)best_model = tuner_hb.get_best_models(num_models=1)[0] best_model.evaluate(x_test_scaled, y_test)
The resulting MSE is 395.19 which is a lot worse when compared to random search but a little bit better than not tuning at all.
Bayesian Optimization
Bayesian optimization is a probabilistic model that maps the hyperparameters to a probability score on the objective function. Unlike Random Search and Hyperband models, Bayesian Optimization keeps track of its past evaluation results and uses it to build the probability model.
tuner_bo = BayesianOptimization( hypermodel, objective='mse', max_trials=10, seed=42, executions_per_trial=2 )tuner_bo.search(x_train_scaled, y_train, epochs=10, validation_split=0.2, verbose=0)best_model = tuner_bo.get_best_models(num_models=1)[0] best_model.evaluate(x_test_scaled, y_test)
Best model MSE tuned using bayesian optimization is 46.47, better than the first two tuners we have tried.
Conclusion
We were able to show that indeed, tuning helps us get the most out of our models. Discussed here are just 3 of the many methods of hyperparameter tuning. When trying out the codes above, we may get slightly different results, for some reason, despite setting numpy, tensorflow, and keras tuner random seeds, results per iteration still differ slightly.
Furthermore, tuners can also be tuned! Yes, you read that right, tuning the tuners. Tuners accepts values such as max_trials and execution per trial and are can therefore be tuned as well. Try changing these parameters and see if you get further improvements.
References
[1] F. Chollet, Deep Learning with Python (2018), Manning Publications Inc.
[2] Keras Tuner Documentation, https://keras-team.github.io/keras-tuner/
[3] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar, Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization (2018), https://arxiv.org/abs/1603.06560
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Algorithms + Data Structures = Programs
Niklaus Wirth / Prentice Hall / 1975-11-11 / GBP 84.95
It might seem completely dated with all its examples written in the now outmoded Pascal programming language (well, unless you are one of those Delphi zealot trying to resist to the Java/.NET dominanc......一起来看看 《Algorithms + Data Structures = Programs》 这本书的介绍吧!