内容简介:This notebook takes you through the implementation of multi-class image classification with CNNs using theSet the random seed.Set
How to train your neural net
PyTorch[Vision] — Multiclass Image Classification
May 9 ·13min read
This notebook takes you through the implementation of multi-class image classification with CNNs using the Rock Paper Scissor dataset on PyTorch.
Import Libraries
import numpy as np import pandas as pd import seaborn as sns from tqdm.notebook import tqdm import matplotlib.pyplot as plt import torch import torchvision import torch.nn as nn import torch.optim as optim import torch.nn.functional as F from torchvision import transforms, utils, datasets from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler from sklearn.metrics import classification_report, confusion_matrix
Set the random seed.
np.random.seed(0) torch.manual_seed(0)
Set Seaborn
style.
%matplotlib inline sns.set_style('darkgrid')
Define Paths and Set GPU
Let’s define the path for our data.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")print("We're using =>", device)root_dir = "../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/" print("The data lies here =>", root_dir)###################### OUTPUT ######################We're using => cuda The data lies here => ../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/
Define transforms
Let’s define a dictionary to hold the image transformations for train/test sets. We will resize all images to have size (224, 224) as well as convert the images to tensor.
The ToTensor
operation in PyTorch converts all tensors to lie between (0, 1).
ToTensor
converts a PIL Image or numpy.ndarray
(H x W x C) in the range [0, 255] to a torch.FloatTensor
of shape (C x H x W) in the range [0.0, 1.0]
image_transforms = { "train": transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor() ]), "test": transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor() }
Define transforms
Let’s define a dictionary to hold the image transformations for train/test sets. All images are of size (300,300) . We will still resize (to prevent mistakes) all images to have size (300, 300) as well as convert the images to tensor. The ToTensor
operation in PyTorch converts all tensors to lie between (0, 1).
ToTensor
converts a PIL Image or numpy.ndarray
(H x W x C) in the range [0, 255] to a torch.FloatTensor
of shape (C x H x W) in the range [0.0, 1.0]
image_transforms = { "train": transforms.Compose([ transforms.Resize((300, 300)), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ]), "test": transforms.Compose([ transforms.Resize((300, 300)), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ]) }
Initialize Datasets
Train + Validation Dataset
We 2 dataset folders with us — Train and Test .
We will further divide our Train set as Train + Val .
rps_dataset = datasets.ImageFolder(root = root_dir + "train", transform = image_transforms["train"] )rps_dataset ###################### OUTPUT ######################Dataset ImageFolder Number of datapoints: 2520 Root location: ../../../data/computer_vision/image_classification/rock-paper-scissor/train StandardTransform Transform: Compose( Resize(size=(300, 300), interpolation=PIL.Image.BILINEAR) ToTensor() Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) )
Class <=> ID Mapping of Output
The class_to_idx
function is pre-built in PyTorch. It returns class ID's present in the dataset.
rps_dataset.class_to_idx ###################### OUTPUT ###################### {'paper': 0, 'rock': 1, 'scissors': 2}
We will now construct a reverse of this dictionary; a mapping of ID to class.
idx2class = {v: k for k, v in rps_dataset.class_to_idx.items()} idx2class ###################### OUTPUT ######################{0: 'paper', 1: 'rock', 2: 'scissors'}
Let’s also write a function that takes in a dataset object and returns a dictionary that contains the count of class samples. We will use this dictionary to construct plots and observe the class distribution in our data.
get_class_distribution()
takes in an argument called dataset_obj
.
count_dict
plot_from_dict()
takes in 3 arguments: a dictionary called dict_obj
, plot_title
, and **kwargs
. We pass in **kwargs
because later on, we will construct subplots which require passing the ax
argument in seaborn.
- First, convert the dictionary to a dataframe.
- Melt the dataframe and plot.
def get_class_distribution(dataset_obj): count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()} for _, label_id in dataset_obj: label = idx2class[label_id] count_dict[label] += 1 return count_dict def plot_from_dict(dict_obj, plot_title, **kwargs): return sns.barplot(data = pd.DataFrame.from_dict([dict_obj]).melt(), x = "variable", y="value", hue="variable", **kwargs).set_title(plot_title)plt.figure(figsize=(15,8)) plot_from_dict(get_class_distribution(rps_dataset), plot_title="Entire Dataset (before train/val/test split)")
Get Train and Validation Samples
We use SubsetRandomSampler
to make our train and validation loaders. SubsetRandomSampler
is used so that each batch receives a random distribution of classes.
We could’ve also split our dataset into 2 parts — train and val, ie. make 2 Subsets
. But this is simpler because our data loader will pretty much handle everything now.
SubsetRandomSampler(indices)
takes as input the indices of data.
We first create our samplers and then we’ll pass it to our dataloaders.
SubsetRandomSampler
Create a list of indices from 0 to length of dataset.
rps_dataset_size = len(rps_dataset) rps_dataset_indices = list(range(rps_dataset_size))
Shuffle the list of indices using np.shuffle.
np.random.shuffle(rps_dataset_indices)
Create the split index. We choose the split index to be 20% (0.2) of the dataset size.
val_split_index = int(np.floor(0.2 * rps_dataset_size))
Slice the lists to obtain 2 lists of indices, one for train and other for test.
0-----------val_split_index------------------------------n
Train => val_split_index to n
Val => 0 to val_split_index
train_idx, val_idx = rps_dataset_indices[val_split_index:], rps_dataset_indices[:val_split_index]
Finally, create samplers.
train_sampler = SubsetRandomSampler(train_idx) val_sampler = SubsetRandomSampler(val_idx)
Test
Now that we’re done with train and val data, let’s load our test dataset.
rps_dataset_test = datasets.ImageFolder(root = root_dir + "test", transform = image_transforms["test"])rps_dataset_test ###################### OUTPUT ######################Dataset ImageFolder Number of datapoints: 372 Root location: ../../../data/computer_vision/image_classification/rock-paper-scissor/test StandardTransform Transform: Compose( Resize(size=(300, 300), interpolation=PIL.Image.BILINEAR) ToTensor() Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) )
Train, Validation, and Test Dataloader
Now, we will pass the samplers to our dataloader. Note that shuffle=True
cannot be used when you're using the SubsetRandomSampler
.
train_loader = DataLoader(dataset=rps_dataset, shuffle=False, batch_size=8, sampler=train_sampler)val_loader = DataLoader(dataset=rps_dataset, shuffle=False, batch_size=1, sampler=val_sampler)test_loader = DataLoader(dataset=rps_dataset_test, shuffle=False, batch_size=1)
Explore The Data
To explore our train and val data-loaders, let’s create a new function that takes in a data-loader and returns a dictionary with class counts.
- Initialize a dictionary
count_dict
to all 0s. - If the batch_size of the
dataloader_obj
is 1, then loop through thedataloader_obj
and update the counter. - Else, if the batch_size of the
dataloader_obj
is not 1, then loop through thedataloader_obj
to obtain batches. Loop through the batches to obtain individual tensors. Now, updated the counter accordingly.
def get_class_distribution_loaders(dataloader_obj, dataset_obj): count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()} if dataloader_obj.batch_size == 1: for _,label_id in dataloader_obj: y_idx = label_id.item() y_lbl = idx2class[y_idx] count_dict[str(y_lbl)] += 1 else: for _,label_id in dataloader_obj: for idx in label_id: y_idx = idx.item() y_lbl = idx2class[y_idx] count_dict[str(y_lbl)] += 1 return count_dict
To plot the class distributions, we will use the plot_from_dict()
function defined earlier with the ax
argument.
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(18,7))plot_from_dict(get_class_distribution_loaders(train_loader, rps_dataset), plot_title="Train Set", ax=axes[0])plot_from_dict(get_class_distribution_loaders(val_loader, rps_dataset), plot_title="Val Set", ax=axes[1])
Now that we’ve looked at the class distributions, Let’s now look at a single image.
single_batch = next(iter(train_loader))
single_batch
is a list of 2 elements. The first element (0th index) contains the image tensors while the second element (1st index) contains the output labels.
Here’s the first element of the list which is a tensor. This tensor is of the shape (batch, channels, height, width)
.
single_batch[0].shape ###################### OUTPUT ######################torch.Size([8, 3, 300, 300])
Here are the output labels for the batch.
print("Output label tensors: ", single_batch[1]) print("\nOutput label tensor shape: ", single_batch[1].shape) ###################### OUTPUT ######################Output label tensors: tensor([2, 0, 2, 2, 0, 1, 0, 0])Output label tensor shape: torch.Size([8])
To plot the image, we’ll use plt.imshow
from matloptlib. It expects the image dimension to be (height, width, channels)
. We'll .permute()
our single image tensor to plot it.
# Selecting the first image tensor from the batch. single_image = single_batch[0][0] single_image.shape ###################### OUTPUT ######################torch.Size([3, 300, 300])
Let’s plot the image.
plt.imshow(single_image.permute(1, 2, 0))
PyTorch has made it easier for us to plot the images in a grid straight from the batch.
We first extract out the image tensor from the list (returned by our dataloader) and set nrow
. Then we use the plt.imshow()
function to plot our grid. Remember to .permute()
the tensor dimensions!
# We do single_batch[0] because each batch is a list # where the 0th index is the image tensor and 1st index is the # output label.single_batch_grid = utils.make_grid(single_batch[0], nrow=4)plt.figure(figsize = (10,10)) plt.imshow(single_batch_grid.permute(1, 2, 0))
Define a CNN Architecture
Our architecture is simple. We use 4 blocks of Conv layers. Each block consists of Convolution
+ BatchNorm
+ ReLU
+ Dropout
layers.
We will not use an FC
layer at the end. We'll stick with a Conv
layer.
Converting FC layers to CONV layers — Source
It is worth noting that the only difference between FC
and CONV
layers is that the neurons in the CONV
layer are connected only to a local region in the input, and that many of the neurons in a CONV
volume share parameters. However, the neurons in both layers still compute dot products, so their functional form is identical. Therefore, it turns out that it’s possible to convert between FC
and CONV
layers.
For any CONV
layer there is an FC
layer that implements the same forward function. The weight matrix would be a large matrix that is mostly zero except for at certain blocks (due to local connectivity) where the weights in many of the blocks are equal (due to parameter sharing).
Conversely, any FC
layer can be converted to a CONV
layer. For example, an FC
layer with K=4096
that is looking at some input volume of size 7×7×512
can be equivalently expressed as a CONV
layer with F=7,P=0,S=1,K=4096
.
In other words, we are setting the filter size to be exactly the size of the input volume, and hence the output will simply be 1×1×4096
since only a single depth column “fits” across the input volume, giving identical result as the initial FC
layer.
class RpsClassifier(nn.Module): def __init__(self): super(RpsClassifier, self).__init__() self.block1 = self.conv_block(c_in=3, c_out=256, dropout=0.1, kernel_size=5, stride=1, padding=2) self.block2 = self.conv_block(c_in=256, c_out=128, dropout=0.1, kernel_size=3, stride=1, padding=1) self.block3 = self.conv_block(c_in=128, c_out=64, dropout=0.1, kernel_size=3, stride=1, padding=1) self.lastcnn = nn.Conv2d(in_channels=64, out_channels=3, kernel_size=75, stride=1, padding=0) self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2) def forward(self, x): x = self.block1(x) x = self.maxpool(x) x = self.block2(x) x = self.block3(x) x = self.maxpool(x) x = self.lastcnn(x) return x def conv_block(self, c_in, c_out, dropout, **kwargs): seq_block = nn.Sequential( nn.Conv2d(in_channels=c_in, out_channels=c_out, **kwargs), nn.BatchNorm2d(num_features=c_out), nn.ReLU(), nn.Dropout2d(p=dropout) ) return seq_block
Now we’ll initialize the model, optimizer, and loss function.
Then we’ll transfer the model to GPU.
We’re using the nn.CrossEntropyLoss
even though it's a binary classification problem. This means, instead of returning a single output of 1/0
, we'll treat return 2 values of 0 and 1
. More specifically, probabilities of the output being either 1
or 0
.
We don’t have to manually apply a log_softmax
layer after our final layer because nn.CrossEntropyLoss
does that for us.
However, we need to apply log_softmax
for our validation and testing.
model = RpsClassifier() model.to(device) print(model)criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.005) ###################### OUTPUT ######################RpsClassifier( (block1): Sequential( (0): Conv2d(3, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Dropout2d(p=0.1, inplace=False) ) (block2): Sequential( (0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Dropout2d(p=0.1, inplace=False) ) (block3): Sequential( (0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() (3): Dropout2d(p=0.1, inplace=False) ) (lastcnn): Conv2d(64, 3, kernel_size=(75, 75), stride=(1, 1)) (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) )
Before we start our training, let’s define a function to calculate accuracy per epoch.
This function takes y_pred
and y_test
as input arguments. We then apply softmax
to y_pred
and extract the class which has a higher probability.
After that, we compare the predicted classes and the actual classes to calculate the accuracy.
def multi_acc(y_pred, y_test): y_pred_softmax = torch.log_softmax(y_pred, dim = 1) _, y_pred_tags = torch.max(y_pred_softmax, dim = 1) correct_pred = (y_pred_tags == y_test).float() acc = correct_pred.sum() / len(correct_pred) acc = torch.round(acc) * 100 return acc
We’ll also define 2 dictionaries which will store the accuracy/epoch and loss/epoch for both train and validation sets.
accuracy_stats = { 'train': [], "val": [] }loss_stats = { 'train': [], "val": [] }
Let’s TRAIN our model!
You can see we’ve put a model.train()
at the before the loop. model.train()
tells PyTorch that you're in training mode. Well, why do we need to do that? If you're using layers such as Dropout
or BatchNorm
which behave differently during training and evaluation (for example; not use dropout
during evaluation), you need to tell PyTorch to act accordingly. While the default mode in PyTorch is the train, so, you don't explicitly have to write that. But it's good practice.
Similarly, we’ll call model.eval()
when we test our model. We'll see that below. Back to training; we start a for-loop. At the top of this for-loop, we initialize our loss and accuracy per epoch to 0. After every epoch, we'll print out the loss/accuracy and reset it back to 0.
Then we have another for-loop. This for-loop is used to get our data in batches from the train_loader
.
We do optimizer.zero_grad()
before we make any predictions. Since the .backward()
function accumulates gradients, we need to set it to 0 manually per mini-batch. From our defined model, we then obtain a prediction, get the loss(and accuracy) for that mini-batch, perform back-propagation using loss.backward() and optimizer.step().
Finally, we add all the mini-batch losses (and accuracies) to obtain the average loss (and accuracy) for that epoch. We add up all the losses/accuracies for each mini-batch and finally divide it by the number of mini-batches ie. length of train_loader
to obtain the average loss/accuracy per epoch.
The procedure we follow for training is the exact same for validation except for the fact that we wrap it up in torch.no_grad
and not perform any back-propagation. torch.no_grad()
tells PyTorch that we do not want to perform back-propagation, which reduces memory usage and speeds up computation.
print("Begin training.")for e in tqdm(range(1, 11)): # TRAINING train_epoch_loss = 0 train_epoch_acc = 0 model.train() for X_train_batch, y_train_batch in train_loader: X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device) optimizer.zero_grad() y_train_pred = model(X_train_batch).squeeze() train_loss = criterion(y_train_pred, y_train_batch) train_acc = multi_acc(y_train_pred, y_train_batch) train_loss.backward() optimizer.step() train_epoch_loss += train_loss.item() train_epoch_acc += train_acc.item() # VALIDATION with torch.no_grad(): model.eval() val_epoch_loss = 0 val_epoch_acc = 0 for X_val_batch, y_val_batch in val_loader: X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device) y_val_pred = model(X_val_batch).squeeze() y_val_pred = torch.unsqueeze(y_val_pred, 0) val_loss = criterion(y_val_pred, y_val_batch) val_acc = multi_acc(y_val_pred, y_val_batch) val_epoch_loss += train_loss.item() val_epoch_acc += train_acc.item() loss_stats['train'].append(train_epoch_loss/len(train_loader)) loss_stats['val'].append(val_epoch_loss/len(val_loader)) accuracy_stats['train'].append(train_epoch_acc/len(train_loader)) accuracy_stats['val'].append(val_epoch_acc/len(val_loader)) print(f'Epoch {e+0:02}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(val_loader):.5f} | Train Acc: {train_epoch_acc/len(train_loader):.3f}| Val Acc: {val_epoch_acc/len(val_loader):.3f}')###################### OUTPUT ######################Begin training.Epoch 01: | Train Loss: 33.38733 | Val Loss: 10.19880 | Train Acc: 91.667| Val Acc: 100.000Epoch 02: | Train Loss: 6.49906 | Val Loss: 41.86950 | Train Acc: 99.603| Val Acc: 100.000Epoch 03: | Train Loss: 3.15175 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 04: | Train Loss: 0.40076 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 05: | Train Loss: 5.56540 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 06: | Train Loss: 1.56760 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 07: | Train Loss: 1.21176 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 08: | Train Loss: 0.84762 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 09: | Train Loss: 0.35811 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 10: | Train Loss: 0.01389 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000
Visualize Loss and Accuracy
To plot the loss and accuracy line plots, we again create a dataframe from the accuracy_stats
and loss_stats
dictionaries.
train_val_acc_df = pd.DataFrame.from_dict(accuracy_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"}) # Plot line charts fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(30,10))sns.lineplot(data=train_val_acc_df, x = "epochs", y="value", hue="variable", ax=axes[0]).set_title('Train-Val Accuracy/Epoch')sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable", ax=axes[1]).set_title('Train-Val Loss/Epoch')
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
神经网络与机器学习(原书第3版)
[加] Simon Haykin / 申富饶、徐烨、郑俊、晁静 / 机械工业出版社 / 2011-3 / 79.00元
神经网络是计算智能和机器学习的重要分支,在诸多领域都取得了很大的成功。在众多神经网络著作中,影响最为广泛的是Simon Haykin的《神经网络原理》(第3版更名为《神经网络与机器学习》)。在本书中,作者结合近年来神经网络和机器学习的最新进展,从理论和实际应用出发,全面、系统地介绍了神经网络的基本模型、方法和技术,并将神经网络和机器学习有机地结合在一起。 本书不但注重对数学分析方法和理论的探......一起来看看 《神经网络与机器学习(原书第3版)》 这本书的介绍吧!