内容简介:Have you ever wondered how Facebook takes care of the abusive and inappropriate images shared by some of its users? Or how Facebook’s tagging feature works? Or how Google Lens recognizes products through images?All of the above are examples of
Have you ever wondered how Facebook takes care of the abusive and inappropriate images shared by some of its users? Or how Facebook’s tagging feature works? Or how Google Lens recognizes products through images?
All of the above are examples of image classification in different settings. Multiclass image classification is a common task in computer vision, where we categorize an image into three or more classes.
In the past, I always used Keras for computer vision projects. However, recently when the opportunity to work on multiclass image classification presented itself, I decided to use PyTorch. I have already moved from Keras to PyTorch for allNLP tasks, so why not vision, too?
PyTorch is powerful, and I also like its more pythonic structure.
In this post, we’ll create an end to end pipeline for image multiclass classification using Pytorch. This will include training the model, putting the model’s results in a form that can be shown to business partners, and functions to help deploy the model easily. As an added feature we will look at Test Time Augmentation using Pytorch also.
But before we learn how to do image classification, let’s first look at transfer learning, the most common method for dealing with such problems.
What is Transfer Learning?
Transfer learning is the process of repurposing knowledge from one task to another. From a modelling perspective, this means using a model trained on one dataset and fine-tuning it for use with another. But why does it work?
Let’s start with some background. Every year the visual recognition community comes together for a very particular challenge: The Imagenet Challenge . The task in this challenge is to classify 1,000,000 images into 1,000 categories.
This challenge has already resulted in researchers training big convolutional deep learning models. The results have included great models like Resnet50 and Inception.
But, what does it mean to train a neural model? Essentially, it means the researchers have learned the weights for a neural network after training the model on a million images.
So, what if we could get those weights? We could then use them and load them into our own neural networks model to predict on the test dataset, right? Actually, we can go even further than that; we can add an extra layer on top of the neural network these researchers have prepared to classify our own dataset.
While the exact workings of these complex models is still a mystery, we do know that the lower convolutional layers capture low-level image features like edges and gradients. In comparison, higher convolutional layers capture more and more intricate details, such as body parts, faces, and other compositional features.
Source: Visualizing and Understanding Convolutional Networks . You can see how the first few layers capture basic shapes, and the shapes become more and more complex in the later layers.
In the example above from ZFNet (a variant of Alexnet), one of the first convolutional neural networks to achieve success on the Imagenet task, you can see how the lower layers capture lines and edges, and the later layers capture more complex features. The final fully-connected layers are generally assumed to capture information that is relevant for solving the respective task, e.g. ZFNet’s fully-connected layers indicate which features are relevant for classifying an image into one of 1,000 object categories.
For a new vision task, it is possible for us to simply use the off-the-shelf features of a state-of-the-art CNN pre-trained on ImageNet, and train a new model on these extracted features.
The intuition behind this idea is that a model trained to recognize animals might also be used to recognize cats vs dogs. In our case, > # a model that has been trained on 1000 different categories has seen a lot of real-world information, and we can use this information to create our own custom classifier.
So that’s the theory and intuition. How do we get it to actually work? Let’s look at some code. You can find the complete code for this post on Github .
Data Exploration
We will start with the Boat Dataset from Kaggle to understand the multiclass image classification problem. This dataset contains about 1,500 pictures of boats of different types: buoys, cruise ships, ferry boats, freight boats, gondolas, inflatable boats, kayaks, paper boats, and sailboats. Our goal is to create a model that looks at a boat image and classifies it into the correct category.
Here’s a sample of images from the dataset:
And here are the category counts:
Since the categories “freight boats”, “inflatable boats” , and “boats” don’t have a lot of images; we will be removing these categories when we train our model.
Creating the required Directory Structure
Before we can go through with training our deep learning models, we need to create the required directory structure for our images. Right now, our data directory structure looks like:
images sailboat kayak . .
We need our images to be contained in 3 folders train, val and test. We will then train on the images in train dataset, validate on the ones in the val dataset and finally test them on images in the test dataset.
data train sailboat kayak . . val sailboat kayak . . test sailboat kayak . .
You might have your data in a different format, but I have found that apart from the usual libraries, the glob.glob and os.system functions are very helpful. Here you can find the complete data preparation code . Now let’s take a quick look at some of the not-so-used libraries that I found useful while doing data prep.
What is glob.glob?
Simply, glob lets you get names of files or folders in a directory using a regex. For example, you can do something like:
from glob import glob categories = glob(“images/*”) print(categories) ------------------------------------------------------------------ ['images/kayak', 'images/boats', 'images/gondola', 'images/sailboat', 'images/inflatable boat', 'images/paper boat', 'images/buoy', 'images/cruise ship', 'images/freight boat', 'images/ferry boat']
What is os.system?
os.system is a function in os library which lets you run any command-line function in python itself. I generally use it to run Linux functions, but it can also be used to run R scripts within python as shown here . For example, I use it in my data preparation to copy files from one directory to another after getting the information from a pandas data frame. I also use f string formatting .
import os for i,row in fulldf.iterrows(): # Boat category cat = row['category'] # section is train,val or test section = row['type'] # input filepath to copy ipath = row['filepath'] # output filepath to paste opath = ipath.replace(f"images/",f"data/{section}/") # running the cp command os.system(f"cp '{ipath}' '{opath}'")
Now since we have our data in the required folder structure, we can move on to more exciting parts.
Data Preprocessing
Transforms:
1. Imagenet Preprocessing
In order to use our images with a network trained on the Imagenet dataset, we need to preprocess our images in the same way as the Imagenet network. For that, we need to rescale the images to 224×224 and normalize them as per Imagenet standards. We can use the torchvision transforms library to do that. Here we take a CenterCrop of 224×224 and normalize as per Imagenet standards. The operations defined below happen sequentially. You can find a list of all transforms provided by PyTorch here .
transforms.Compose([ transforms.CenterCrop(size=224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ])
2. Data Augmentations
We can do a lot more preprocessing for data augmentations. Neural networks work better with a lot of data. Data augmentation is a strategy which we use at training time to increase the amount of data we have.
For example, we can flip the image of a boat horizontally, and it will still be a boat. Or we can randomly crop images or add color jitters. Here is the image transforms dictionary I have used that applies to both the Imagenet preprocessing as well as augmentations. This dictionary contains the various transforms we have for the train, test and validation data as used in this great post . As you’d expect, we don’t apply the horizontal flips or other data augmentation transforms to the test data and validation data because we don’t want to get predictions on an augmented image.
# Image transformations image_transforms = { # Train uses data augmentation 'train': transforms.Compose([ transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)), transforms.RandomRotation(degrees=15), transforms.ColorJitter(), transforms.RandomHorizontalFlip(), transforms.CenterCrop(size=224), # Image net standards transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # Imagenet standards ]), # Validation does not use augmentation 'valid': transforms.Compose([ transforms.Resize(size=256), transforms.CenterCrop(size=224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), # Test does not use augmentation 'test': transforms.Compose([ transforms.Resize(size=256), transforms.CenterCrop(size=224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), }
Here is an example of the train transforms applied to an image in the training dataset. Not only do we get a lot of different images from a single image, but it also helps our network become invariant to the object orientation.
ex_img = Image.open('/home/rahul/projects/compvisblog/data/train/cruise ship/cruise-ship-oasis-of-the-seas-boat-water-482183.jpg') t = image_transforms['train'] plt.figure(figsize=(24, 24)) for i in range(16): ax = plt.subplot(4, 4, i + 1) _ = imshow_tensor(t(ex_img), ax=ax) plt.tight_layout()
DataLoaders
The next step is to provide the training, validation, and test dataset locations to PyTorch. We can do this by using the PyTorch datasets and DataLoader class. This part of the code will mostly remain the same if we have our data in the required directory structures.
# Datasets from folders traindir = "data/train" validdir = "data/val" testdir = "data/test" data = { 'train': datasets.ImageFolder(root=traindir, transform=image_transforms['train']), 'valid': datasets.ImageFolder(root=validdir, transform=image_transforms['valid']), 'test': datasets.ImageFolder(root=testdir, transform=image_transforms['test']) } # Dataloader iterators, make sure to shuffle dataloaders = { 'train': DataLoader(data['train'], batch_size=batch_size, shuffle=True,num_workers=10), 'val': DataLoader(data['valid'], batch_size=batch_size, shuffle=True,num_workers=10), 'test': DataLoader(data['test'], batch_size=batch_size, shuffle=True,num_workers=10) }
These dataloaders help us to iterate through the dataset. For example, we will use the dataloader below in our model training. The data variable will contain data in the form (batch_size, color_channels, height, width) while the target is of shape (batch_size) and hold the label information.
train_loader = dataloaders['train'] for ii, (data, target) in enumerate(train_loader):
Modeling
1. Create the model using a pre-trained model
Right now these following pre-trained models are available to use in the torchvision library:
-
Inception v3
-
ShuffleNet v2
-
MobileNet v2
Here I will be using resnet50 on our dataset, but you can effectively use any other model too as per your choice.
from torchvision import models model = models.resnet50(pretrained=True)
We start by freezing our model weights since we don’t want to change the weights for the renet50 models.
# Freeze model weights for param in model.parameters(): param.requires_grad = False
The next thing we need to do is to replace the linear classification layer in the model by our custom classifier. I have found that to do this, it is better first to see the model structure to determine what is the final linear layer. We can do this simply by printing the model object:
print(model) ------------------------------------------------------------------ ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) . . . . (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (avgpool): AdaptiveAvgPool2d(output_size=(1, 1)) **(fc): Linear(in_features=2048, out_features=1000, bias=True)** )
Here we find that the final linear layer that takes the input from the convolutional layers is named fc
We can now simply replace the fc layer using our custom neural network. This neural network takes input from the previous layer to fc and gives the log softmax output of shape (batch_size x n_classes).
n_inputs = model.fc.in_features model.fc = nn.Sequential( nn.Linear(n_inputs, 256), nn.ReLU(), nn.Dropout(0.4), nn.Linear(256, n_classes), nn.LogSoftmax(dim=1))
Please note that the new layers added now are fully trainable by default.
2. Load the model on GPU
We can use a single GPU or multiple GPU(if we have them) using DataParallel from PyTorch. Here is what we can use to detect the GPU as well as the number of GPUs to load the model on GPU. Right now I am training my models on dual Titan RTX GPUs.
# Whether to train on a gpu train_on_gpu = cuda.is_available() print(f'Train on gpu: {train_on_gpu}') # Number of gpus if train_on_gpu: gpu_count = cuda.device_count() print(f'{gpu_count} gpus detected.') if gpu_count > 1: multi_gpu = True else: multi_gpu = False if train_on_gpu: model = model.to('cuda') if multi_gpu: model = nn.DataParallel(model)
3. Define criterion and optimizers
One of the most important things to notice when you are training any model is the choice of loss-function and the optimizer used. Here we want to use categorical cross-entropy as we have got a multiclass classification problem and the Adam optimizer, which is the most commonly used optimizer. But since we are applying a LogSoftmax operation on the output of our model, we will be using the NLL loss.
from torch import optim criteration = nn.NLLLoss() optimizer = optim.Adam(model.parameters())
4. Training the model
Given below is the full code used to train the model. It might look pretty big on its own, but essentially what we are doing is as follows:
-
Start running epochs. In each epoch-
-
Set the model mode to train using model.train().
-
Loop through the data using the train dataloader.
-
Load your data to the GPU using the data, target = data.cuda(), target.cuda() command
-
Set the existing gradients in the optimizer to zero using optimizer.zero_grad()
-
Run the forward pass through the batch using output = model(data)
-
Compute loss using loss = criterion(output, target)
-
Backpropagate the losses through the network using loss.backward()
-
Take an optimizer step to change the weights in the whole network using optimizer.step()
-
All the other steps in the training loop are just to maintain the history and calculate accuracy.
-
Set the model mode to eval using model.eval().
-
Get predictions for the validation data using valid_loader and calculate valid_loss and valid_acc
-
Print the validation loss and validation accuracy results every print_every epoch.
-
Save the best model based on validation loss.
-
Early Stopping:If the cross-validation loss doesn’t improve for max_epochs_stop stop the training and load the best available model with the minimum validation loss.
Here is the output from running the above code. Just showing the last few epochs. The validation accuracy started at ~55% in the first epoch, and we ended up with a validation accuracy of ~90%.
And here are the training curves showing the loss and accuracy metrics:
Inference and Model Results
We want our results in different ways to use our model. For one, we require test accuracies and confusion matrices. All of the code for creating these results is in the code notebook.
1. Test Results
The overall accuracy of the test model is:
Overall Accuracy: 88.65 %
Here is the confusion matrix for results on the test dataset.
We can also look at the category wise accuracies. I have also added the train counts to see the results from a new perspective.
2. Visualizing Predictions for Single Image
For deployment purposes, it helps to be able to get predictions for a single image. You can get the code from the notebook.
3. Visualizing Predictions for a Category
We can also see the category wise results for debugging purposes and presentations.
4. Test results with Test Time Augmentation
We can also do test time augmentation to increase our test accuracy. Here I am using a new test data loader and transforms:
# Image transformations tta_random_image_transforms = transforms.Compose([ transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)), transforms.RandomRotation(degrees=15), transforms.ColorJitter(), transforms.RandomHorizontalFlip(), transforms.CenterCrop(size=224), # Image net standards transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # Imagenet standards ]) # Datasets from folders ttadata = { 'test': datasets.ImageFolder(root=testdir, transform=tta_random_image_transforms) } # Dataloader iterators ttadataloader = { 'test': DataLoader(ttadata['test'], batch_size=512, shuffle=False,num_workers=10) }
We can then get the predictions on the test set using the below function:
In the function above, I am applying the tta_random_image_transforms to each image 5 times before getting its prediction. The final prediction is the average of all five predictions. When we use TTA over the whole test dataset, we noticed that the accuracy increased by around 1%
TTA Accuracy: 89.71%
Also, here is the results for TTA compared to normal results category wise:
In this small dataset, the TTA might not seem to add much value, but I have noticed that it adds value with big datasets.
Conclusion
In this post, I talked about the end to end pipeline for working on a multiclass image classification project using PyTorch. We worked on creating some readymade code to train a model using transfer learning, visualize the results, use Test time augmentation, and got predictions for a single image so that we can deploy our model when needed using any tool like Streamlit .
You can find the complete code for this post on Github .
If you would like to learn more about Image Classification and Convolutional Neural Networks take a look at the Deep Learning Specialization from Andrew Ng. Also, to learn more about PyTorch and start from the basics, you can take a look at the Deep Neural Networks with PyTorch course offered by IBM.
Thanks for the read. I am going to be writing more beginner-friendly posts in the future too. Follow me up at Medium or Subscribe to my blog to be informed about them. As always, I welcome feedback and constructive criticism and can be reached on Twitter @mlwhiz .
Also, a small disclaimer — There might be some affiliate links in this post to relevant resources, as sharing knowledge is never a bad idea.
This post was first published here .
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
PHP+MySQL八大动态Web应用实战
Jono Bacom / 吴连河、李剑 / 电子工业出版社 / 2008-6 / 68.00元
本书详细介绍了利用PHP+MySQL开发常见类型Web应用程序的完整设计和编码技术,并对整体设计与关键代码给予了细致、深入的剖析。其内容注重实践,提供了翔实完整的实战代码;思路独树一帜,突破过多描述语言细节的窠臼;行文风趣幽默,轻松调侃中将项目的完整设计过程分析得一清二楚。书中的示例项目完整而实用,读者甚至无需任何改动即可在实际中加以运用。. 本书适合对PHP/MySQL有初步了解但缺乏完整......一起来看看 《PHP+MySQL八大动态Web应用实战》 这本书的介绍吧!