PyTorch [Vision] — Multiclass Image Classification

栏目: IT技术 · 发布时间: 4年前

内容简介:This notebook takes you through the implementation of multi-class image classification with CNNs using theSet the random seed.Set

How to train your neural net

PyTorch[Vision] — Multiclass Image Classification

May 9 ·13min read

This notebook takes you through the implementation of multi-class image classification with CNNs using the Rock Paper Scissor dataset on PyTorch.

Import Libraries

import numpy as np
import pandas as pd
import seaborn as sns
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import transforms, utils, datasets
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler
from sklearn.metrics import classification_report, confusion_matrix

Set the random seed.

np.random.seed(0)
torch.manual_seed(0)

Set Seaborn style.

%matplotlib inline
sns.set_style('darkgrid')

Define Paths and Set GPU

Let’s define the path for our data.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")print("We're using =>", device)root_dir = "../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/"
print("The data lies here =>", root_dir)###################### OUTPUT ######################We're using => cuda
The data lies here => ../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/

Define transforms

Let’s define a dictionary to hold the image transformations for train/test sets. We will resize all images to have size (224, 224) as well as convert the images to tensor.

The ToTensor operation in PyTorch converts all tensors to lie between (0, 1).

ToTensor converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

image_transforms = {
    "train": transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor()
    ]),
    "test": transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor()
}

Define transforms

Let’s define a dictionary to hold the image transformations for train/test sets. All images are of size (300,300) . We will still resize (to prevent mistakes) all images to have size (300, 300) as well as convert the images to tensor. The ToTensor operation in PyTorch converts all tensors to lie between (0, 1).

ToTensor converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

image_transforms = {
    "train": transforms.Compose([
        transforms.Resize((300, 300)),
        transforms.ToTensor(),
        transforms.Normalize([0.5, 0.5, 0.5],
                             [0.5, 0.5, 0.5])
    ]),
    "test": transforms.Compose([
        transforms.Resize((300, 300)),
        transforms.ToTensor(),
        transforms.Normalize([0.5, 0.5, 0.5],
                             [0.5, 0.5, 0.5])
    ])
}

Initialize Datasets

Train + Validation Dataset

We 2 dataset folders with us — Train and Test .

We will further divide our Train set as Train + Val .

rps_dataset = datasets.ImageFolder(root = root_dir + "train",
                                   transform = image_transforms["train"]
                                  )rps_dataset
###################### OUTPUT ######################Dataset ImageFolder
    Number of datapoints: 2520
    Root location: ../../../data/computer_vision/image_classification/rock-paper-scissor/train
    StandardTransform
Transform: Compose(
               Resize(size=(300, 300), interpolation=PIL.Image.BILINEAR)
               ToTensor()
               Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
           )

Class <=> ID Mapping of Output

The class_to_idx function is pre-built in PyTorch. It returns class ID's present in the dataset.

rps_dataset.class_to_idx
###################### OUTPUT ######################
{'paper': 0, 'rock': 1, 'scissors': 2}

We will now construct a reverse of this dictionary; a mapping of ID to class.

idx2class = {v: k for k, v in rps_dataset.class_to_idx.items()}
idx2class
###################### OUTPUT ######################{0: 'paper', 1: 'rock', 2: 'scissors'}

Let’s also write a function that takes in a dataset object and returns a dictionary that contains the count of class samples. We will use this dictionary to construct plots and observe the class distribution in our data.

get_class_distribution() takes in an argument called dataset_obj .

count_dict

plot_from_dict() takes in 3 arguments: a dictionary called dict_obj , plot_title , and **kwargs . We pass in **kwargs because later on, we will construct subplots which require passing the ax argument in seaborn.

  • First, convert the dictionary to a dataframe.
  • Melt the dataframe and plot.
def get_class_distribution(dataset_obj):
    count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()}    for _, label_id in dataset_obj:
        label = idx2class[label_id]
        count_dict[label] += 1
    return count_dict
def plot_from_dict(dict_obj, plot_title, **kwargs):
    return sns.barplot(data = pd.DataFrame.from_dict([dict_obj]).melt(), x = "variable", y="value", hue="variable", **kwargs).set_title(plot_title)plt.figure(figsize=(15,8))
plot_from_dict(get_class_distribution(rps_dataset), plot_title="Entire Dataset (before train/val/test split)")

PyTorch [Vision] — Multiclass Image Classification

Data distribution [Image [1]]

Get Train and Validation Samples

We use SubsetRandomSampler to make our train and validation loaders. SubsetRandomSampler is used so that each batch receives a random distribution of classes.

We could’ve also split our dataset into 2 parts — train and val, ie. make 2 Subsets . But this is simpler because our data loader will pretty much handle everything now.

SubsetRandomSampler(indices) takes as input the indices of data.

We first create our samplers and then we’ll pass it to our dataloaders.

SubsetRandomSampler

Create a list of indices from 0 to length of dataset.

rps_dataset_size = len(rps_dataset)
rps_dataset_indices = list(range(rps_dataset_size))

Shuffle the list of indices using np.shuffle.

np.random.shuffle(rps_dataset_indices)

Create the split index. We choose the split index to be 20% (0.2) of the dataset size.

val_split_index = int(np.floor(0.2 * rps_dataset_size))

Slice the lists to obtain 2 lists of indices, one for train and other for test.

0-----------val_split_index------------------------------n

Train => val_split_index to n

Val => 0 to val_split_index

train_idx, val_idx = rps_dataset_indices[val_split_index:], rps_dataset_indices[:val_split_index]

Finally, create samplers.

train_sampler = SubsetRandomSampler(train_idx)
val_sampler = SubsetRandomSampler(val_idx)

Test

Now that we’re done with train and val data, let’s load our test dataset.

rps_dataset_test = datasets.ImageFolder(root = root_dir + "test",
                                        transform = image_transforms["test"])rps_dataset_test
###################### OUTPUT ######################Dataset ImageFolder
    Number of datapoints: 372
    Root location: ../../../data/computer_vision/image_classification/rock-paper-scissor/test
    StandardTransform
Transform: Compose(
               Resize(size=(300, 300), interpolation=PIL.Image.BILINEAR)
               ToTensor()
               Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
           )

Train, Validation, and Test Dataloader

Now, we will pass the samplers to our dataloader. Note that shuffle=True cannot be used when you're using the SubsetRandomSampler .

train_loader = DataLoader(dataset=rps_dataset, shuffle=False, batch_size=8, sampler=train_sampler)val_loader = DataLoader(dataset=rps_dataset, shuffle=False, batch_size=1, sampler=val_sampler)test_loader = DataLoader(dataset=rps_dataset_test, shuffle=False, batch_size=1)

Explore The Data

To explore our train and val data-loaders, let’s create a new function that takes in a data-loader and returns a dictionary with class counts.

  • Initialize a dictionary count_dict to all 0s.
  • If the batch_size of the dataloader_obj is 1, then loop through the dataloader_obj and update the counter.
  • Else, if the batch_size of the dataloader_obj is not 1, then loop through the dataloader_obj to obtain batches. Loop through the batches to obtain individual tensors. Now, updated the counter accordingly.
def get_class_distribution_loaders(dataloader_obj, dataset_obj):
    count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()}    if dataloader_obj.batch_size == 1:    
        for _,label_id in dataloader_obj:
            y_idx = label_id.item()
            y_lbl = idx2class[y_idx]
            count_dict[str(y_lbl)] += 1
    else: 
        for _,label_id in dataloader_obj:
            for idx in label_id:
                y_idx = idx.item()
                y_lbl = idx2class[y_idx]
                count_dict[str(y_lbl)] += 1
    return count_dict

To plot the class distributions, we will use the plot_from_dict() function defined earlier with the ax argument.

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(18,7))plot_from_dict(get_class_distribution_loaders(train_loader, rps_dataset), plot_title="Train Set", ax=axes[0])plot_from_dict(get_class_distribution_loaders(val_loader, rps_dataset), plot_title="Val Set", ax=axes[1])

PyTorch [Vision] — Multiclass Image Classification

Train-Val class distribution [Image [2]]

Now that we’ve looked at the class distributions, Let’s now look at a single image.

single_batch = next(iter(train_loader))

single_batch is a list of 2 elements. The first element (0th index) contains the image tensors while the second element (1st index) contains the output labels.

Here’s the first element of the list which is a tensor. This tensor is of the shape (batch, channels, height, width) .

single_batch[0].shape
###################### OUTPUT ######################torch.Size([8, 3, 300, 300])

Here are the output labels for the batch.

print("Output label tensors: ", single_batch[1])
print("\nOutput label tensor shape: ", single_batch[1].shape)
###################### OUTPUT ######################Output label tensors:  tensor([2, 0, 2, 2, 0, 1, 0, 0])Output label tensor shape:  torch.Size([8])

To plot the image, we’ll use plt.imshow from matloptlib. It expects the image dimension to be (height, width, channels) . We'll .permute() our single image tensor to plot it.

# Selecting the first image tensor from the batch. 
single_image = single_batch[0][0]
single_image.shape
###################### OUTPUT ######################torch.Size([3, 300, 300])

Let’s plot the image.

plt.imshow(single_image.permute(1, 2, 0))

PyTorch [Vision] — Multiclass Image Classification

Single image sample [Image [3]]

PyTorch has made it easier for us to plot the images in a grid straight from the batch.

We first extract out the image tensor from the list (returned by our dataloader) and set nrow . Then we use the plt.imshow() function to plot our grid. Remember to .permute() the tensor dimensions!

# We do single_batch[0] because each batch is a list 
# where the 0th index is the image tensor and 1st index is the
# output label.single_batch_grid = utils.make_grid(single_batch[0], nrow=4)plt.figure(figsize = (10,10))
plt.imshow(single_batch_grid.permute(1, 2, 0))

PyTorch [Vision] — Multiclass Image Classification

Image sample grid [Image [4]]

Define a CNN Architecture

Our architecture is simple. We use 4 blocks of Conv layers. Each block consists of Convolution + BatchNorm + ReLU + Dropout layers.

We will not use an FC layer at the end. We'll stick with a Conv layer.

Converting FC layers to CONV layers — Source

It is worth noting that the only difference between FC and CONV layers is that the neurons in the CONV layer are connected only to a local region in the input, and that many of the neurons in a CONV volume share parameters. However, the neurons in both layers still compute dot products, so their functional form is identical. Therefore, it turns out that it’s possible to convert between FC and CONV layers.

For any CONV layer there is an FC layer that implements the same forward function. The weight matrix would be a large matrix that is mostly zero except for at certain blocks (due to local connectivity) where the weights in many of the blocks are equal (due to parameter sharing).

Conversely, any FC layer can be converted to a CONV layer. For example, an FC layer with K=4096 that is looking at some input volume of size 7×7×512 can be equivalently expressed as a CONV layer with F=7,P=0,S=1,K=4096 .

In other words, we are setting the filter size to be exactly the size of the input volume, and hence the output will simply be 1×1×4096 since only a single depth column “fits” across the input volume, giving identical result as the initial FC layer.

class RpsClassifier(nn.Module):
    def __init__(self):
        super(RpsClassifier, self).__init__()
        self.block1 = self.conv_block(c_in=3, c_out=256, dropout=0.1, kernel_size=5, stride=1, padding=2)
        self.block2 = self.conv_block(c_in=256, c_out=128, dropout=0.1, kernel_size=3, stride=1, padding=1)
        self.block3 = self.conv_block(c_in=128, c_out=64, dropout=0.1, kernel_size=3, stride=1, padding=1)
        self.lastcnn = nn.Conv2d(in_channels=64, out_channels=3, kernel_size=75, stride=1, padding=0)        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
    def forward(self, x):
        x = self.block1(x)
        x = self.maxpool(x)        x = self.block2(x)        x = self.block3(x)
        x = self.maxpool(x)        x = self.lastcnn(x)        return x
    def conv_block(self, c_in, c_out, dropout, **kwargs):
        seq_block = nn.Sequential(
            nn.Conv2d(in_channels=c_in, out_channels=c_out, **kwargs),
            nn.BatchNorm2d(num_features=c_out),
            nn.ReLU(),
            nn.Dropout2d(p=dropout)
        )        return seq_block

Now we’ll initialize the model, optimizer, and loss function.

Then we’ll transfer the model to GPU.

We’re using the nn.CrossEntropyLoss even though it's a binary classification problem. This means, instead of returning a single output of 1/0 , we'll treat return 2 values of 0 and 1 . More specifically, probabilities of the output being either 1 or 0 .

We don’t have to manually apply a log_softmax layer after our final layer because nn.CrossEntropyLoss does that for us.

However, we need to apply log_softmax for our validation and testing.

model = RpsClassifier()
model.to(device)
print(model)criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.005)
###################### OUTPUT ######################RpsClassifier(
  (block1): Sequential(
    (0): Conv2d(3, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (block2): Sequential(
    (0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (block3): Sequential(
    (0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (lastcnn): Conv2d(64, 3, kernel_size=(75, 75), stride=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

Before we start our training, let’s define a function to calculate accuracy per epoch.

This function takes y_pred and y_test as input arguments. We then apply softmax to y_pred and extract the class which has a higher probability.

After that, we compare the predicted classes and the actual classes to calculate the accuracy.

def multi_acc(y_pred, y_test):
    y_pred_softmax = torch.log_softmax(y_pred, dim = 1)
    _, y_pred_tags = torch.max(y_pred_softmax, dim = 1)        correct_pred = (y_pred_tags == y_test).float()
    acc = correct_pred.sum() / len(correct_pred)    acc = torch.round(acc) * 100    return acc

We’ll also define 2 dictionaries which will store the accuracy/epoch and loss/epoch for both train and validation sets.

accuracy_stats = {
    'train': [],
    "val": []
}loss_stats = {
    'train': [],
    "val": []
}

Let’s TRAIN our model!

You can see we’ve put a model.train() at the before the loop. model.train() tells PyTorch that you're in training mode. Well, why do we need to do that? If you're using layers such as Dropout or BatchNorm which behave differently during training and evaluation (for example; not use dropout during evaluation), you need to tell PyTorch to act accordingly. While the default mode in PyTorch is the train, so, you don't explicitly have to write that. But it's good practice.

Similarly, we’ll call model.eval() when we test our model. We'll see that below. Back to training; we start a for-loop. At the top of this for-loop, we initialize our loss and accuracy per epoch to 0. After every epoch, we'll print out the loss/accuracy and reset it back to 0.

Then we have another for-loop. This for-loop is used to get our data in batches from the train_loader .

We do optimizer.zero_grad() before we make any predictions. Since the .backward() function accumulates gradients, we need to set it to 0 manually per mini-batch. From our defined model, we then obtain a prediction, get the loss(and accuracy) for that mini-batch, perform back-propagation using loss.backward() and optimizer.step().

Finally, we add all the mini-batch losses (and accuracies) to obtain the average loss (and accuracy) for that epoch. We add up all the losses/accuracies for each mini-batch and finally divide it by the number of mini-batches ie. length of train_loader to obtain the average loss/accuracy per epoch.

The procedure we follow for training is the exact same for validation except for the fact that we wrap it up in torch.no_grad and not perform any back-propagation. torch.no_grad() tells PyTorch that we do not want to perform back-propagation, which reduces memory usage and speeds up computation.

print("Begin training.")for e in tqdm(range(1, 11)):    # TRAINING    train_epoch_loss = 0
    train_epoch_acc = 0    model.train()
    for X_train_batch, y_train_batch in train_loader:
        X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device)        optimizer.zero_grad()        y_train_pred = model(X_train_batch).squeeze()        train_loss = criterion(y_train_pred, y_train_batch)
        train_acc = multi_acc(y_train_pred, y_train_batch)        train_loss.backward()
        optimizer.step()        train_epoch_loss += train_loss.item()
        train_epoch_acc += train_acc.item()
    # VALIDATION
    with torch.no_grad():
        model.eval()
        val_epoch_loss = 0
        val_epoch_acc = 0
        for X_val_batch, y_val_batch in val_loader:
            X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)            y_val_pred = model(X_val_batch).squeeze()            y_val_pred = torch.unsqueeze(y_val_pred, 0)            val_loss = criterion(y_val_pred, y_val_batch)
            val_acc = multi_acc(y_val_pred, y_val_batch)            val_epoch_loss += train_loss.item()
            val_epoch_acc += train_acc.item()    loss_stats['train'].append(train_epoch_loss/len(train_loader))
    loss_stats['val'].append(val_epoch_loss/len(val_loader))
    accuracy_stats['train'].append(train_epoch_acc/len(train_loader))
    accuracy_stats['val'].append(val_epoch_acc/len(val_loader))
    print(f'Epoch {e+0:02}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(val_loader):.5f} | Train Acc: {train_epoch_acc/len(train_loader):.3f}| Val Acc: {val_epoch_acc/len(val_loader):.3f}')###################### OUTPUT ######################Begin training.Epoch 01: | Train Loss: 33.38733 | Val Loss: 10.19880 | Train Acc: 91.667| Val Acc: 100.000Epoch 02: | Train Loss: 6.49906 | Val Loss: 41.86950 | Train Acc: 99.603| Val Acc: 100.000Epoch 03: | Train Loss: 3.15175 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 04: | Train Loss: 0.40076 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 05: | Train Loss: 5.56540 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 06: | Train Loss: 1.56760 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 07: | Train Loss: 1.21176 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 08: | Train Loss: 0.84762 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 09: | Train Loss: 0.35811 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 10: | Train Loss: 0.01389 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000

Visualize Loss and Accuracy

To plot the loss and accuracy line plots, we again create a dataframe from the accuracy_stats and loss_stats dictionaries.

train_val_acc_df = pd.DataFrame.from_dict(accuracy_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})
# Plot line charts
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(30,10))sns.lineplot(data=train_val_acc_df, x = "epochs", y="value", hue="variable",  ax=axes[0]).set_title('Train-Val Accuracy/Epoch')sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable", ax=axes[1]).set_title('Train-Val Loss/Epoch')

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

神经网络与机器学习(原书第3版)

神经网络与机器学习(原书第3版)

[加] Simon Haykin / 申富饶、徐烨、郑俊、晁静 / 机械工业出版社 / 2011-3 / 79.00元

神经网络是计算智能和机器学习的重要分支,在诸多领域都取得了很大的成功。在众多神经网络著作中,影响最为广泛的是Simon Haykin的《神经网络原理》(第3版更名为《神经网络与机器学习》)。在本书中,作者结合近年来神经网络和机器学习的最新进展,从理论和实际应用出发,全面、系统地介绍了神经网络的基本模型、方法和技术,并将神经网络和机器学习有机地结合在一起。 本书不但注重对数学分析方法和理论的探......一起来看看 《神经网络与机器学习(原书第3版)》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具