内容简介:Have some basic idea and just here for a snippet of code to build your baseline model? Use this:Computer vision has been there since the 1950s but it is only in the last decade that the field totally transformed itself(in fact the special moment happened i
How to use fastai to build a strong baseline model for image classification?
Jun 27 ·8min read
TL;DR — Code Snippet To Use
Have some basic idea and just here for a snippet of code to build your baseline model? Use this:
from fastai.vision import *# Defining the data and the model
path = untar_data(URLs.CIFAR)
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, valid='test', ds_tfms=tfms, size=128)
learn = cnn_learner(data, models.resnet50, metrics=accuracy)# Estimate learning rate
learn.lr_find()
learn.recorder.plot()# Training
learn.fit_one_cycle(3, max_lr=1e-3)# Finetuning
learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6, 1e-4))# Test Time Augmentation
preds,targs = learn.TTA()
accuracy(preds, targs).item()
Introduction
Computer vision has been there since the 1950s but it is only in the last decade that the field totally transformed itself(in fact the special moment happened in 2012 when AlexNet won the ImageNet challenge).
A lot of powerful frameworks have come up in the last few years. We are going to use fastai since, at the time of writing, it offers the easiest APIs and strongest defaults. It’s a high-level wrapper over PyTorch. Our aim is to build a general-purpose image classification baseline by 2020 standards.
Image Classification Task
Let’s use the popular CIFAR-10 dataset which contains 60,000 32x32 color images in 10 different classes. It’s divided into 50,000 training examples and 10,000 test examples. The CIFAR dataset is already provided as a sample dataset in the fastai library.
from fastai.vision import * path = untar_data(URLs.CIFAR)
Let’s start with a minimal example straight from the docs and modify it as we go along
tfms = get_transforms(do_flip=False) data = ImageDataBunch.from_folder(path, valid='test', ds_tfms=tfms) learn = cnn_learner(data, models.resnet50, metrics=accuracy) learn.fit(3)
Let’s break down the 3 most important things happening here:
- Data Augmentation : Techniques such as cropping, padding, and horizontal flipping are commonly used while training large neural networks for image classification. This helps to avoid overfitting and generalize better without having to actually collect any new data. The simplest way to do this is to use the
get_transforms
in fastai which lets you choose from a standard set of transforms, where the defaults are designed for photos. - Splitting dataset into train-validation-test sets : We should always split the dataset into train-validation-test sets. If we are using the validation set to tune any hyperparameter(say model architecture, learning rate, etc), it is required to have a test set as well if you want to report the final metric(say accuracy). In this example, we are not tuning any hyperparameter using the validation set, so we are not using a test set. In practice, you could use say the 50000 examples for training, 5000 for validation, and 5000 for the test set. The ImageDatabunch API in fastai provides an easy way to load your data and split it if it’s already stored in some standard formats. You can also use the DataBlock API if you want more custom control on how to select your data, split and label it.
- Transfer Learning : Transfer learning involves taking models trained on one task and then using it on a different task. Here we are using an architecture called ResNet50 trained on ImageNet, which contains about 14 million images. You can also experiment with other architectures including deeper ResNet models once you have a baseline. The idea behind transfer learning is that most of the earlier layers of the network would identify generic features like edges that are useful for the classification of any image. When we train for the new task, we will keep all the convolutional layers (called the body or the backbone of the model) with their weights pre-trained on ImageNet but will define a new head initialized randomly. This head is adapted to the number of classes needed for the new classification task. By calling
fit
on thecnn_learner
by default, we keep the body frozen and only train the head.
We are getting about 70% accuracy at this point after training for 3 epochs.
Modifying Input Image Size
The ResNet was originally trained on 224x224 images and our dataset has 32x32 images. If you use input images that are too different from the original size, then it is not optimal for the model. Let’s see the effect on accuracy once we resize these images to 128x128. ImageDataBunch API lets you pass in a size.
tfms = get_transforms(do_flip=False) data = ImageDataBunch.from_folder(path, valid='test', ds_tfms=tfms, size=128) learn = cnn_learner(data, models.resnet50, metrics=accuracy) learn.fit(3)
The accuracy shoots up to 94%. We have already reached about human-level performance on this task. We could’ve resized to 224x224 as well but that’ll increase the training time and also since these are quite low-resolution images, it might leave artifacts when we scale up a lot. You can experiment with this later, as needed, once you have a baseline.
Modifying Learning Rate
Learning Rate is often the most important hyper-parameter to tune when training neural networks. It affects the rate at which the weights of the network are modified with each training batch. A very small learning rate can lead to slow training. A very large learning rate, on the other hand, can cause the loss function to fluctuate around the minimum or even diverge. Traditionally, it is a hyperparameter tuned and set using grid-search with optional formulas that decrease the learning rate using a particular strategy (like Time-Based Decay, Step Decay, etc).
Leslie Smith, in 2015, came up with a new method for setting the learning rates called Cyclical Learning Rate (CLR). Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values. This eliminates the need to find the best value for the learning rate. Fastai has an implementation of one-cycle CLR policy in which the learning rate starts at a low value, increases to a very large value, and then decreases to a value much lower than its initial one.
Fastai also comes with a handy method to estimate the maximum learning rate to be used in the one-cycle policy. For this, it begins to train the model while increasing the learning rate from a very low value to a very large one. Initially, the losses come down but as the learning rate keeps going up, the losses start to increase. When the loss is around the minimum, the learning rate is already too high. A rule of thumb is to set the maximum learning rate to be an order of magnitude less than the minimum.
learn = cnn_learner(data, models.resnet50, metrics=accuracy) learn.lr_find() learn.recorder.plot()
Here you can try setting the learning rate between 1e-03 and 1-e02. For now, we’ll just use 1e-03.
learn = cnn_learner(data, models.resnet50, metrics=accuracy) learn.fit_one_cycle(3, max_lr=1e-3)
We get to about 95% accuracy on the validation set now in 3 epochs.
FineTuning
We earlier trained only the new randomly initialized head of our model, keeping the weights of the body frozen. We can now unfreeze the body and train the whole network. Let’s use lr_find
and estimate the new learning rate to use before we do that.
learn.lr_find() learn.recorder.plot()
The training curve seems to have shifted by an order of magnitude from the previous one and looks like 1e-4 seems like a reasonable training rate. Unlike before, we may not want to use the same maximum learning rate for all the layers in the body. We may instead want to use discriminative learning rates , where earlier layers get much smaller learning rates. The idea again is that earlier layers would learn more basic features like edges(which is useful for your new task also) and later layers would learn more complex features(like maybe recognize heads which may not necessarily be useful for your new task). Fastai lets you send a slice, where the first value would be used as the learning rate of first layers, the second value would be the learning rate of the final layer and the layers in between would have values in between.
learn.unfreeze() learn.fit_one_cycle(3, max_lr=slice(1e-6, 1e-4))
We can see that the accuracy is still improving.
Test Time Augmentation
One final thing that you can do for your general-purpose benchmark is Test Time Augmentation(TTA). TTA is done on the test/validation set to improve the prediction quality — it involves first creating multiple versions of each image using data augmentation, and then we pass those through our trained model to get multiple predictions and finally we take the average of those predictions to get the final prediction. TTA often gives us a better performance without needing any additional training but it takes a hit on the final prediction time.
preds,targs = learn.TTA() accuracy(preds, targs).item()
We reached over 96% accuracy. That’s great. Another simple thing that could improve our accuracy is just training and fine-tuning simply for more epochs. But let’s stop here for now.
Where do we stand?
Check out the state of the art benchmarks achieved on CIFAR-10. Above 96% is only something achieved in the last couple of years. Most of the models there are trained for a significantly longer time, using more resources than a collab notebook and having more parameters — hence we can be confident that our 96% for this general-purpose image classification is really good.
Conclusion
One of the most useful things you can do at the beginning of a new machine learning project is establishing a baseline. For image classification, fastai helps you to create a strong baseline quickly with almost no hyperparameter tuning.
You can find the complete code used for this post in this colab notebook . Feel free to play around with it and use it in your projects. I’d love to discuss if you have any thoughts or questions.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
互联网产品运营:产品经理的10堂精英课
丁华、聂嵘海、王晶 / 电子工业出版社 / 2017-5 / 59
《互联网产品运营:产品经理的10堂精英课》共有10章,前9章分别从互联网产品运营的9个点入手,最后一章辅以案例,分析当下市场热门产品的运营模式。 第1章点明在运营产品之前需要经过缜密的策划,这样才能有明确的运营方向;第2章讲述产品运营的定位,有了准确的定位,运营才不会走偏;第3章描述用户运营,用户是一款产品的根本,没有用户,产品就是死的;第4章讲述内容运营的技巧,产品内容要怎么运营才能受到用......一起来看看 《互联网产品运营:产品经理的10堂精英课》 这本书的介绍吧!