Transfer learning for Deep Neural Networks using TensorFlow

栏目: IT技术 · 发布时间: 4年前

内容简介:A practical and hands-on example to know how to use transfer learning using TensorFlow.In this article, we will learn how to use transfer learning for a classification task.One of the most powerful ideas in deep learning is that we can take the knowledge t

Transfer learning for Deep Neural Networks using TensorFlow

A practical and hands-on example to know how to use transfer learning using TensorFlow.

Photo by Jopwell from Pexels

In this article, we will learn how to use transfer learning for a classification task.

One of the most powerful ideas in deep learning is that we can take the knowledge that a neural network has learned from one task and apply that knowledge to another task. This is called transfer learning.

Transfer learning makes sense when we have a lot of data for the problem we are transferring from and usually relatively less data for the problem we are transferring the knowledge to.

As the first step lets import required modules and load cats_vs_dogs dataset which is a TensorFlow Dataset. We will consider only 20% of the dataset, as we want to experiment with the usage of transfer learning when the training data is less.

Note: I prefer explaining the code using comments in the code snippets.

Sample images from the dataset

As we can notice that images in the dataset are of different shapes lets convert them to the same shapes and form batches of data for training. Please refer to the tf.image and tf.data.Dataset modules before moving to the next part.

formatting images to the required format

Non-Pre-Trained Model :

Let us use the MobileNet V2 Neural Network for our example. We can directly import it from tf.keras.applications, which has different inbuilt models that can be directly used. On the other hand, we can import pre-trained model weights by defining the “weights” parameter.

At first, we will check the accuracy of the model without importing the pre-trained model for the chosen small dataset and later compare it with a pre-trained model. So assigning weights = None and changing the last classification layer as our application has only two classes.

Creating basic non-pre-trained model

From the result of original_model.summary(), we can observe that there are 2,223,872 trainable parameters. In order to convert (batch_size, 5, 5, 1280 ) into the last stage of classification, we use a GlobalAveragePooling2D followed by a Dense(1, activation = “sigmoid”) as the last layer which can be used for classification, as we have only two classes in our dataset to be classified.

Adding an output layer to the model

Let us train the model for 10 epochs and see how the accuracy metrics are for training, validation, and test sets.

training and testing the model

Non-pre-trained model:Epochs = 10

training loss: 0.5750, training accuracy: 0.8306

val_loss: 0.6958, val_accuracy: 0.4815

test_loss: 0.6991, test acc: 0.4952

In this article, we will use two ways to customize a pre-trained model:

Feature Extraction/Frozen Pre-Trained Model:We will use the representations of a previous network to learn to extract meaningful features for new samples. We will simply add a new classifier to the pre-trained model and train only the classifier part from scratch so that we can use the feature maps previously learned for the dataset.

Fine-Tuning/Unfrozen Pre-Trained Model:We will unfreeze a few of the top layers of a pre-trained model and jointly train both the newly-added classifier layer and the unfrozen layers of the pre-trained model. This allows us to “fine-tune” the representations of the higher-order features in the base model to make them more relevant for this particular task.

Frozen Pre-Trained Model :

In the next step, let us work with the same model and the same datasets, but we will import the model along with weights it learned by training on the “imagenet” dataset. We can load the weights by assigning the parameter weights = “imagenet”. As in the previous case, we will define the classification layer of the model according to our application. In this section, we will only train the classifier part and freeze the whole pre-trained model layers. We set frozen_model.trainable = False to achieve this.

building, training, and testing frozen pre-trained mode

Frozen pre-trained model:Epoch 10/10

training loss: 0.5322, training accuracy: 0.9652

val_loss: 0.5226, val_accuracy: 0.9725

test_loss: 0.5315, test acc: 0.9664

Unfrozen Pre-Trained Model:

In this section, we will unfreeze some of the topmost layers of the pre-trained model, add a classification layer, and then fine-tune the whole model by training in with the available small data sets. We can unfreeze layers by setting layer.trainable = True for a certain number of layers. In this case, the trainable parameters of the model will be around 1,862,592.

building, training, and testing unfroze pre-trained mode

Unfrozen pre-trained model:Epoch 10/10

training loss: 0.5030, training accuracy: 0.9989

val_loss: 0.5060, val_accuracy: 0.9828

test_loss: 0.5123, test acc: 0.9810

Comparison:

We can observe that the pre-trained models out-performed the base model in terms of accuracy.

Comparison of above-discussed models

Conclusion:

Using a pre-trained model for feature extraction: It is common practise to take advantage of features learned from a model trained on a larger dataset within the same domain while operating with a small data set. It is achieved by instantiating the pre-trained model and placing on top of it a fully connected classifier.

The pre-trained model is “frozen” and only the classifier weights are changed during the training. In this case, all the features associated with each image were extracted by the convolutional layers, and we have only trained a classifier that determines the image class provided the set of extracted features.

Fine-tuning a pre-trained model: In order to further improve performance, the top-level layers of the pre-trained models could be repurposed via fine-tuning to the new dataset.

In this case, we tuned our weights in such a way that our model learned high-level features specific to the dataset. Usually, this technique is recommended when the training dataset is large enough and very similar to the original data set on which the pre-trained model was trained on.

The complete Jupiter notebook can be found at my git hub .

Please provide feedback on the article if any areas of my writing can be improved. Thank you.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Web视觉设计

Web视觉设计

Penny McIntire / 叶永彬 / 机械工业出版社 / 2008-08 / 56.00元

本书系统全面地介绍Web页面外观设计的相关知识。本书分为八章:导论、站点分析、导航、页面布局、色彩、图形、排版和表单。全面讲解网站界面所涉及的内容,叙述生动,由浅入深,提供了大量的示例代码以具体地说明如何运用所讨论的设计概念。. 本书可供Web开发技术人员和美工人员参考。...一起来看看 《Web视觉设计》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

MD5 加密
MD5 加密

MD5 加密工具