Checkpointing Deep Learning Models in Keras

栏目: IT技术 · 发布时间: 5年前

内容简介:In this article, you will learn how to checkpoint a deep learning model built using Keras and then reinstate the model architecture and trained weights to a new model or resume the training from you left offIt acts like an autosave for your model in case t

Checkpointing Deep Learning Models in Keras

Learn how to save deep learning models using checkpoints and how to reload them

Different methods to save and load the deep learning model are using

In this article, you will learn how to checkpoint a deep learning model built using Keras and then reinstate the model architecture and trained weights to a new model or resume the training from you left off

Usage of Checkpoints

  • Allow us to use a pre-trained model for inference without having to retrain the model
  • Resume the training process from where we left off in case it was interrupted or for fine-tuning the model

It acts like an autosave for your model in case training is interrupted for any reason.

Steps for saving and loading model and weights using checkpoint

  • Create the model
  • Specify the path where we want to save the checkpoint files
  • Create the callback function to save the model
  • Apply the callback function during the training
  • Evaluate the model on test data
  • Load the pre-trained weights on a new model using l oad_weights() or restoring the weights from the latest checkpoint

Create the base model architecture with the loss function, metrics, and optimizer

We have created the multi-class classification model for Fashion MNIST dataset

# Define the model architecture 
def create_model():
model = tf.keras.Sequential()
# Must define the input shape in the first layer of the neural network
model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28,28,1)))
model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10, activation='softmax'))

#Compiling the model
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])

return model
#create the model
model_ckpt= create_model()

Specify the path where the checkpoint files will be stored

checkpoint_path = "train_ckpt/cp.ckpt"

Create the callback function to save the model.

Callback functions are applied at different stages of training to give a view on the internal training states.

We create a callback function to save the model weights using ModelCheckpoint .

If we set save_weight_only to True, then only the weights will be saved. Model architecture, loss, and the optimizer will not be saved.

We can also specify if we want to save the model at every epoch or every n number of epochs.

# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,save_best_only=True, save_weights_only=True, verbose=1)

ModelCheckpoint callback classhas the following arguments:

  • filepath : specify the path or filename where we want to save the model
  • monitor : the metrics that we want to monitor such as loss or accuracy
  • verbosity : 0 for debug mode and 1 for info
  • save_weights_only : If set to True, then only model weights will be saved else the full model is saved, including the model architecture, weights, loss function, and optimizer.
  • save_best_only : If set to True, then only the best model will be saved based on the quantity we are monitoring. If we are monitoring accuracy and save_best_only is set to True, then the model will be saved every time we get higher accuracy than the previous accuracy.
  • mode : It has three options- auto, min, or max . If we are monitoring accuracy, then set it to the max, and if we are monitoring loss, then set it to min . If we set the mode to auto, then the direction is inferred automatically based on the quantity being monitored
  • save_freq or period : set it to ‘epoch’ or a number . When it set it to epoch, then the model is saved after each epoch. When we specify a number say 5, then the model is saved after every five epochs as shown in the code below
# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq=5)

Apply the callback during the training process

# Train the model with the new callback
# Pass callback to training
model_ckpt.fit(train_images,
train_labels,
batch_size=64,
epochs=10,
validation_data=(test_images,test_labels),
callbacks=[cp_callback])

We can see that if the val_loss does not improve, then the weights are not saved. Whenever the loss is reduced then those weights are saved to the checkpoint file

Evaluating the model on test images

loss,acc = model_ckpt.evaluate(test_images, test_labels, verbose=2)

Checkpoint files

Checkpoint file stores the trained weights to a collection of checkpoint formatted files in a binary format

The TensorFlow save() saves three kinds of files: checkpoint file, index file, and data file. It stores the graph structure separately from the variable values .

checkpoint file: contains prefixes for both an index file as well as for one or more data files

Index files: indicates which weights are stored in which shard. As I trained the model on one machine, we see cp.ckpt.data-00000-of-00002 and cp.ckpt.data-00001-of-00002

data file: saves values for all the variables, without the structure. There can be one or more data files

Checkpoint files

Loading the pre-trained weights

Reasons for loading the pre-trained weights

  • Continue from where we left off or
  • Resume after an interruption or
  • Load the pre-trained weight for inference

We create a new model to load the pre-trained weights.

When loading a new model with the pre-trained weights, the new model should have the same architecture as the original model.

# Create a basic model instance
model_ckpt2 = create_model()

We load the pre-trained weights into our new model using load_weights() .

model_ckpt2.load_weights(checkpoint_path)

We can make inferences using the new model on the test images

loss,acc = model_ckpt2.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

An untrained model will perform at chance levels (~10% accuracy)

To resume the training where we left off

model_ckpt2.fit(train_images, 
train_labels,
batch_size=64,
epochs=10,
validation_data=(test_images,test_labels),
callbacks=[cp_callback])

we see that the accuracy has changed now

loss,acc = model_ckpt2.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

Loading weights from the latest checkpoints

latest_checkoint() find the filename of the latest saved checkpoint file

#get the latest checkpoint file
checkpoint_dir = os.path.dirname(checkpoint_path)
latest = tf.train.latest_checkpoint(checkpoint_dir)

We create a new model, load the weights from the latest checkpoint and make inferences

Create a new model instance
model_latest_checkpoint = create_model()
# Load the previously saved weights
model_latest_checkpoint.load_weights(latest)
# Re-evaluate the model
loss, acc = model_latest_checkpoint.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

Including epoch number in the filename

# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "training2/cp-{epoch:04d}.ckpt"

code for saving the model and reloading model using Fashion MNIST

Conclusion:

We now understand how to create a callback function using ModelCheckpoint class, the different checkpoint files that get created and then how we can restore the pre-trained weights

References:

https://www.tensorflow.org/tutorials/keras/save_and_load


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

若为自由故

若为自由故

[美] Sam Williams / 邓楠、李凡希 / 人民邮电出版社 / 2015-4 / 49

理查德·马修·斯托曼(Richard Matthew Stallman,简称RMS)是自由软件之父,他是自由软件运动的精神领袖、GNU计划以及自由软件基金会的创立者。作为一个著名的黑客,他的主要成就包括Emacs及后来的GNU Emacs、GNU C 编译器及GDB 调试器。他编写的GNU通用公共许可证(GNU GPL)是世上最广为采用的自由软件许可证,为copyleft观念开拓出一条崭新的道路。......一起来看看 《若为自由故》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具