Image Classification Baseline Model For 2020

栏目: IT技术 · 发布时间: 4年前

内容简介:Have some basic idea and just here for a snippet of code to build your baseline model? Use this:Computer vision has been there since the 1950s but it is only in the last decade that the field totally transformed itself(in fact the special moment happened i

How to use fastai to build a strong baseline model for image classification?

TL;DR — Code Snippet To Use

Have some basic idea and just here for a snippet of code to build your baseline model? Use this:

from fastai.vision import *# Defining the data and the model
path = untar_data(URLs.CIFAR)
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, valid='test', ds_tfms=tfms, size=128)
learn = cnn_learner(data, models.resnet50, metrics=accuracy)
# Estimate learning rate
learn.lr_find()
learn.recorder.plot()
# Training
learn.fit_one_cycle(3, max_lr=1e-3)
# Finetuning
learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6, 1e-4))
# Test Time Augmentation
preds,targs = learn.TTA()
accuracy(preds, targs).item()

Introduction

Computer vision has been there since the 1950s but it is only in the last decade that the field totally transformed itself(in fact the special moment happened in 2012 when AlexNet won the ImageNet challenge).

A lot of powerful frameworks have come up in the last few years. We are going to use fastai since, at the time of writing, it offers the easiest APIs and strongest defaults. It’s a high-level wrapper over PyTorch. Our aim is to build a general-purpose image classification baseline by 2020 standards.

Image Classification Baseline Model For 2020

Image by Gerhard Gellinger from Pixabay

Image Classification Task

Let’s use the popular CIFAR-10 dataset which contains 60,000 32x32 color images in 10 different classes. It’s divided into 50,000 training examples and 10,000 test examples. The CIFAR dataset is already provided as a sample dataset in the fastai library.

from fastai.vision import *
path = untar_data(URLs.CIFAR)

Let’s start with a minimal example straight from the docs and modify it as we go along

tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, valid='test', ds_tfms=tfms)
learn = cnn_learner(data, models.resnet50, metrics=accuracy)
learn.fit(3)

Let’s break down the 3 most important things happening here:

  1. Data Augmentation : Techniques such as cropping, padding, and horizontal flipping are commonly used while training large neural networks for image classification. This helps to avoid overfitting and generalize better without having to actually collect any new data. The simplest way to do this is to use the get_transforms in fastai which lets you choose from a standard set of transforms, where the defaults are designed for photos.
  2. Splitting dataset into train-validation-test sets : We should always split the dataset into train-validation-test sets. If we are using the validation set to tune any hyperparameter(say model architecture, learning rate, etc), it is required to have a test set as well if you want to report the final metric(say accuracy). In this example, we are not tuning any hyperparameter using the validation set, so we are not using a test set. In practice, you could use say the 50000 examples for training, 5000 for validation, and 5000 for the test set. The ImageDatabunch API in fastai provides an easy way to load your data and split it if it’s already stored in some standard formats. You can also use the DataBlock API if you want more custom control on how to select your data, split and label it.
  3. Transfer Learning : Transfer learning involves taking models trained on one task and then using it on a different task. Here we are using an architecture called ResNet50 trained on ImageNet, which contains about 14 million images. You can also experiment with other architectures including deeper ResNet models once you have a baseline. The idea behind transfer learning is that most of the earlier layers of the network would identify generic features like edges that are useful for the classification of any image. When we train for the new task, we will keep all the convolutional layers (called the body or the backbone of the model) with their weights pre-trained on ImageNet but will define a new head initialized randomly. This head is adapted to the number of classes needed for the new classification task. By calling fit on the cnn_learner by default, we keep the body frozen and only train the head.
Image Classification Baseline Model For 2020
Training for 3 epochs

We are getting about 70% accuracy at this point after training for 3 epochs.

Modifying Input Image Size

The ResNet was originally trained on 224x224 images and our dataset has 32x32 images. If you use input images that are too different from the original size, then it is not optimal for the model. Let’s see the effect on accuracy once we resize these images to 128x128. ImageDataBunch API lets you pass in a size.

tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, valid='test', ds_tfms=tfms, size=128)
learn = cnn_learner(data, models.resnet50, metrics=accuracy)
learn.fit(3)
Image Classification Baseline Model For 2020
Training for 3 epochs after resizing the input

The accuracy shoots up to 94%. We have already reached about human-level performance on this task. We could’ve resized to 224x224 as well but that’ll increase the training time and also since these are quite low-resolution images, it might leave artifacts when we scale up a lot. You can experiment with this later, as needed, once you have a baseline.

Modifying Learning Rate

Learning Rate is often the most important hyper-parameter to tune when training neural networks. It affects the rate at which the weights of the network are modified with each training batch. A very small learning rate can lead to slow training. A very large learning rate, on the other hand, can cause the loss function to fluctuate around the minimum or even diverge. Traditionally, it is a hyperparameter tuned and set using grid-search with optional formulas that decrease the learning rate using a particular strategy (like Time-Based Decay, Step Decay, etc).

Leslie Smith, in 2015, came up with a new method for setting the learning rates called Cyclical Learning Rate (CLR). Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values. This eliminates the need to find the best value for the learning rate. Fastai has an implementation of one-cycle CLR policy in which the learning rate starts at a low value, increases to a very large value, and then decreases to a value much lower than its initial one.

Fastai also comes with a handy method to estimate the maximum learning rate to be used in the one-cycle policy. For this, it begins to train the model while increasing the learning rate from a very low value to a very large one. Initially, the losses come down but as the learning rate keeps going up, the losses start to increase. When the loss is around the minimum, the learning rate is already too high. A rule of thumb is to set the maximum learning rate to be an order of magnitude less than the minimum.

learn = cnn_learner(data, models.resnet50, metrics=accuracy)
learn.lr_find()
learn.recorder.plot()

Image Classification Baseline Model For 2020

Estimating the learning rate using lr_find

Here you can try setting the learning rate between 1e-03 and 1-e02. For now, we’ll just use 1e-03.

learn = cnn_learner(data, models.resnet50, metrics=accuracy)
learn.fit_one_cycle(3, max_lr=1e-3)
Image Classification Baseline Model For 2020
Training for 3 epochs using one-cycle policy

We get to about 95% accuracy on the validation set now in 3 epochs.

FineTuning

We earlier trained only the new randomly initialized head of our model, keeping the weights of the body frozen. We can now unfreeze the body and train the whole network. Let’s use lr_find and estimate the new learning rate to use before we do that.

learn.lr_find()
learn.recorder.plot()

Image Classification Baseline Model For 2020

Estimating the learning rate again using lr_find

The training curve seems to have shifted by an order of magnitude from the previous one and looks like 1e-4 seems like a reasonable training rate. Unlike before, we may not want to use the same maximum learning rate for all the layers in the body. We may instead want to use discriminative learning rates , where earlier layers get much smaller learning rates. The idea again is that earlier layers would learn more basic features like edges(which is useful for your new task also) and later layers would learn more complex features(like maybe recognize heads which may not necessarily be useful for your new task). Fastai lets you send a slice, where the first value would be used as the learning rate of first layers, the second value would be the learning rate of the final layer and the layers in between would have values in between.

learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6, 1e-4))
Image Classification Baseline Model For 2020
Finetuning for 3 more epochs

We can see that the accuracy is still improving.

Test Time Augmentation

One final thing that you can do for your general-purpose benchmark is Test Time Augmentation(TTA). TTA is done on the test/validation set to improve the prediction quality — it involves first creating multiple versions of each image using data augmentation, and then we pass those through our trained model to get multiple predictions and finally we take the average of those predictions to get the final prediction. TTA often gives us a better performance without needing any additional training but it takes a hit on the final prediction time.

preds,targs = learn.TTA()
accuracy(preds, targs).item()
Accuracy after TTA

We reached over 96% accuracy. That’s great. Another simple thing that could improve our accuracy is just training and fine-tuning simply for more epochs. But let’s stop here for now.

Where do we stand?

Check out the state of the art benchmarks achieved on CIFAR-10. Above 96% is only something achieved in the last couple of years. Most of the models there are trained for a significantly longer time, using more resources than a collab notebook and having more parameters — hence we can be confident that our 96% for this general-purpose image classification is really good.

Conclusion

One of the most useful things you can do at the beginning of a new machine learning project is establishing a baseline. For image classification, fastai helps you to create a strong baseline quickly with almost no hyperparameter tuning.

You can find the complete code used for this post in this colab notebook . Feel free to play around with it and use it in your projects. I’d love to discuss if you have any thoughts or questions.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Verilog数字系统设计教程

Verilog数字系统设计教程

夏宇闻 / 北京航空航天大学出版社 / 2003-7-1 / 38.0

《Verilog数字系统设计教程》可作为电子工程类、自动控制类、计算机类的大学本科高年级及研究生教学用书,亦可供其他工程人员自学与参考。一起来看看 《Verilog数字系统设计教程》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

SHA 加密
SHA 加密

SHA 加密工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具