PyTorch [Vision] — Multiclass Image Classification

栏目: IT技术 · 发布时间: 4年前

内容简介：This notebook takes you through the implementation of multi-class image classification with CNNs using theSet the random seed.Set

How to train your neural net

PyTorch[Vision] — Multiclass Image Classification

Akshaj Verma

May 9 ·13min read

This notebook takes you through the implementation of multi-class image classification with CNNs using the Rock Paper Scissor dataset on PyTorch.

Import Libraries

import numpy as np
import pandas as pd
import seaborn as sns
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import transforms, utils, datasets
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler
from sklearn.metrics import classification_report, confusion_matrix

Set the random seed.

np.random.seed(0)
torch.manual_seed(0)

Set Seaborn style.

%matplotlib inline
sns.set_style('darkgrid')

Define Paths and Set GPU

Let’s define the path for our data.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")print("We're using =>", device)root_dir = "../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/"
print("The data lies here =>", root_dir)###################### OUTPUT ######################We're using => cuda
The data lies here => ../../../data/computer_vision/image_classification/hot-dog-not-hot-dog/

Define transforms

Let’s define a dictionary to hold the image transformations for train/test sets. We will resize all images to have size (224, 224) as well as convert the images to tensor.

The ToTensor operation in PyTorch converts all tensors to lie between (0, 1).

ToTensor converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

image_transforms = {
    "train": transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor()
    ]),
    "test": transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor()
}

Define transforms

Let’s define a dictionary to hold the image transformations for train/test sets. All images are of size (300,300) . We will still resize (to prevent mistakes) all images to have size (300, 300) as well as convert the images to tensor. The ToTensor operation in PyTorch converts all tensors to lie between (0, 1).

ToTensor converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

image_transforms = {
    "train": transforms.Compose([
        transforms.Resize((300, 300)),
        transforms.ToTensor(),
        transforms.Normalize([0.5, 0.5, 0.5],
                             [0.5, 0.5, 0.5])
    ]),
    "test": transforms.Compose([
        transforms.Resize((300, 300)),
        transforms.ToTensor(),
        transforms.Normalize([0.5, 0.5, 0.5],
                             [0.5, 0.5, 0.5])
    ])
}

Initialize Datasets

Train + Validation Dataset

We 2 dataset folders with us — Train and Test .

We will further divide our Train set as Train + Val .

rps_dataset = datasets.ImageFolder(root = root_dir + "train",
                                   transform = image_transforms["train"]
                                  )rps_dataset
###################### OUTPUT ######################Dataset ImageFolder
    Number of datapoints: 2520
    Root location: ../../../data/computer_vision/image_classification/rock-paper-scissor/train
    StandardTransform
Transform: Compose(
               Resize(size=(300, 300), interpolation=PIL.Image.BILINEAR)
               ToTensor()
               Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
           )

Class <=> ID Mapping of Output

The class_to_idx function is pre-built in PyTorch. It returns class ID's present in the dataset.

rps_dataset.class_to_idx
###################### OUTPUT ######################
{'paper': 0, 'rock': 1, 'scissors': 2}

We will now construct a reverse of this dictionary; a mapping of ID to class.

idx2class = {v: k for k, v in rps_dataset.class_to_idx.items()}
idx2class
###################### OUTPUT ######################{0: 'paper', 1: 'rock', 2: 'scissors'}

Let’s also write a function that takes in a dataset object and returns a dictionary that contains the count of class samples. We will use this dictionary to construct plots and observe the class distribution in our data.

get_class_distribution() takes in an argument called dataset_obj .

count_dict

plot_from_dict() takes in 3 arguments: a dictionary called dict_obj , plot_title , and **kwargs . We pass in **kwargs because later on, we will construct subplots which require passing the ax argument in seaborn.

First, convert the dictionary to a dataframe.
Melt the dataframe and plot.

def get_class_distribution(dataset_obj):
    count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()}    for _, label_id in dataset_obj:
        label = idx2class[label_id]
        count_dict[label] += 1
    return count_dict
def plot_from_dict(dict_obj, plot_title, **kwargs):
    return sns.barplot(data = pd.DataFrame.from_dict([dict_obj]).melt(), x = "variable", y="value", hue="variable", **kwargs).set_title(plot_title)plt.figure(figsize=(15,8))
plot_from_dict(get_class_distribution(rps_dataset), plot_title="Entire Dataset (before train/val/test split)")

PyTorch [Vision] — Multiclass Image Classification — Data distribution [Image [1]]

Get Train and Validation Samples

We use SubsetRandomSampler to make our train and validation loaders. SubsetRandomSampler is used so that each batch receives a random distribution of classes.

We could’ve also split our dataset into 2 parts — train and val, ie. make 2 Subsets . But this is simpler because our data loader will pretty much handle everything now.

SubsetRandomSampler(indices) takes as input the indices of data.

We first create our samplers and then we’ll pass it to our dataloaders.

SubsetRandomSampler

Create a list of indices from 0 to length of dataset.

rps_dataset_size = len(rps_dataset)
rps_dataset_indices = list(range(rps_dataset_size))

Shuffle the list of indices using np.shuffle.

np.random.shuffle(rps_dataset_indices)

Create the split index. We choose the split index to be 20% (0.2) of the dataset size.

val_split_index = int(np.floor(0.2 * rps_dataset_size))

Slice the lists to obtain 2 lists of indices, one for train and other for test.

0-----------val_split_index------------------------------n

Train => val_split_index to n

Val => 0 to val_split_index

train_idx, val_idx = rps_dataset_indices[val_split_index:], rps_dataset_indices[:val_split_index]

Finally, create samplers.

train_sampler = SubsetRandomSampler(train_idx)
val_sampler = SubsetRandomSampler(val_idx)

Test

Now that we’re done with train and val data, let’s load our test dataset.

rps_dataset_test = datasets.ImageFolder(root = root_dir + "test",
                                        transform = image_transforms["test"])rps_dataset_test
###################### OUTPUT ######################Dataset ImageFolder
    Number of datapoints: 372
    Root location: ../../../data/computer_vision/image_classification/rock-paper-scissor/test
    StandardTransform
Transform: Compose(
               Resize(size=(300, 300), interpolation=PIL.Image.BILINEAR)
               ToTensor()
               Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
           )

Train, Validation, and Test Dataloader

Now, we will pass the samplers to our dataloader. Note that shuffle=True cannot be used when you're using the SubsetRandomSampler .

train_loader = DataLoader(dataset=rps_dataset, shuffle=False, batch_size=8, sampler=train_sampler)val_loader = DataLoader(dataset=rps_dataset, shuffle=False, batch_size=1, sampler=val_sampler)test_loader = DataLoader(dataset=rps_dataset_test, shuffle=False, batch_size=1)

Explore The Data

To explore our train and val data-loaders, let’s create a new function that takes in a data-loader and returns a dictionary with class counts.

Initialize a dictionary count_dict to all 0s.
If the batch_size of the dataloader_obj is 1, then loop through the dataloader_obj and update the counter.
Else, if the batch_size of the dataloader_obj is not 1, then loop through the dataloader_obj to obtain batches. Loop through the batches to obtain individual tensors. Now, updated the counter accordingly.

def get_class_distribution_loaders(dataloader_obj, dataset_obj):
    count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()}    if dataloader_obj.batch_size == 1:    
        for _,label_id in dataloader_obj:
            y_idx = label_id.item()
            y_lbl = idx2class[y_idx]
            count_dict[str(y_lbl)] += 1
    else: 
        for _,label_id in dataloader_obj:
            for idx in label_id:
                y_idx = idx.item()
                y_lbl = idx2class[y_idx]
                count_dict[str(y_lbl)] += 1
    return count_dict

To plot the class distributions, we will use the plot_from_dict() function defined earlier with the ax argument.

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(18,7))plot_from_dict(get_class_distribution_loaders(train_loader, rps_dataset), plot_title="Train Set", ax=axes[0])plot_from_dict(get_class_distribution_loaders(val_loader, rps_dataset), plot_title="Val Set", ax=axes[1])

Now that we’ve looked at the class distributions, Let’s now look at a single image.

single_batch = next(iter(train_loader))

single_batch is a list of 2 elements. The first element (0th index) contains the image tensors while the second element (1st index) contains the output labels.

Here’s the first element of the list which is a tensor. This tensor is of the shape (batch, channels, height, width) .

single_batch[0].shape
###################### OUTPUT ######################torch.Size([8, 3, 300, 300])

Here are the output labels for the batch.

print("Output label tensors: ", single_batch[1])
print("\nOutput label tensor shape: ", single_batch[1].shape)
###################### OUTPUT ######################Output label tensors:  tensor([2, 0, 2, 2, 0, 1, 0, 0])Output label tensor shape:  torch.Size([8])

To plot the image, we’ll use plt.imshow from matloptlib. It expects the image dimension to be (height, width, channels) . We'll .permute() our single image tensor to plot it.

# Selecting the first image tensor from the batch. 
single_image = single_batch[0][0]
single_image.shape
###################### OUTPUT ######################torch.Size([3, 300, 300])

Let’s plot the image.

plt.imshow(single_image.permute(1, 2, 0))

PyTorch has made it easier for us to plot the images in a grid straight from the batch.

We first extract out the image tensor from the list (returned by our dataloader) and set nrow . Then we use the plt.imshow() function to plot our grid. Remember to .permute() the tensor dimensions!

# We do single_batch[0] because each batch is a list 
# where the 0th index is the image tensor and 1st index is the
# output label.single_batch_grid = utils.make_grid(single_batch[0], nrow=4)plt.figure(figsize = (10,10))
plt.imshow(single_batch_grid.permute(1, 2, 0))

Define a CNN Architecture

Our architecture is simple. We use 4 blocks of Conv layers. Each block consists of Convolution + BatchNorm + ReLU + Dropout layers.

We will not use an FC layer at the end. We'll stick with a Conv layer.

Converting FC layers to CONV layers — Source

It is worth noting that the only difference between FC and CONV layers is that the neurons in the CONV layer are connected only to a local region in the input, and that many of the neurons in a CONV volume share parameters. However, the neurons in both layers still compute dot products, so their functional form is identical. Therefore, it turns out that it’s possible to convert between FC and CONV layers.

For any CONV layer there is an FC layer that implements the same forward function. The weight matrix would be a large matrix that is mostly zero except for at certain blocks (due to local connectivity) where the weights in many of the blocks are equal (due to parameter sharing).

Conversely, any FC layer can be converted to a CONV layer. For example, an FC layer with K=4096 that is looking at some input volume of size 7×7×512 can be equivalently expressed as a CONV layer with F=7,P=0,S=1,K=4096 .

In other words, we are setting the filter size to be exactly the size of the input volume, and hence the output will simply be 1×1×4096 since only a single depth column “fits” across the input volume, giving identical result as the initial FC layer.

class RpsClassifier(nn.Module):
    def __init__(self):
        super(RpsClassifier, self).__init__()
        self.block1 = self.conv_block(c_in=3, c_out=256, dropout=0.1, kernel_size=5, stride=1, padding=2)
        self.block2 = self.conv_block(c_in=256, c_out=128, dropout=0.1, kernel_size=3, stride=1, padding=1)
        self.block3 = self.conv_block(c_in=128, c_out=64, dropout=0.1, kernel_size=3, stride=1, padding=1)
        self.lastcnn = nn.Conv2d(in_channels=64, out_channels=3, kernel_size=75, stride=1, padding=0)        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
    def forward(self, x):
        x = self.block1(x)
        x = self.maxpool(x)        x = self.block2(x)        x = self.block3(x)
        x = self.maxpool(x)        x = self.lastcnn(x)        return x
    def conv_block(self, c_in, c_out, dropout, **kwargs):
        seq_block = nn.Sequential(
            nn.Conv2d(in_channels=c_in, out_channels=c_out, **kwargs),
            nn.BatchNorm2d(num_features=c_out),
            nn.ReLU(),
            nn.Dropout2d(p=dropout)
        )        return seq_block

Now we’ll initialize the model, optimizer, and loss function.

Then we’ll transfer the model to GPU.

We’re using the nn.CrossEntropyLoss even though it's a binary classification problem. This means, instead of returning a single output of 1/0 , we'll treat return 2 values of 0 and 1 . More specifically, probabilities of the output being either 1 or 0 .

We don’t have to manually apply a log_softmax layer after our final layer because nn.CrossEntropyLoss does that for us.

However, we need to apply log_softmax for our validation and testing.

model = RpsClassifier()
model.to(device)
print(model)criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.005)
###################### OUTPUT ######################RpsClassifier(
  (block1): Sequential(
    (0): Conv2d(3, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (block2): Sequential(
    (0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (block3): Sequential(
    (0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (lastcnn): Conv2d(64, 3, kernel_size=(75, 75), stride=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

Before we start our training, let’s define a function to calculate accuracy per epoch.

This function takes y_pred and y_test as input arguments. We then apply softmax to y_pred and extract the class which has a higher probability.

After that, we compare the predicted classes and the actual classes to calculate the accuracy.

def multi_acc(y_pred, y_test):
    y_pred_softmax = torch.log_softmax(y_pred, dim = 1)
    _, y_pred_tags = torch.max(y_pred_softmax, dim = 1)        correct_pred = (y_pred_tags == y_test).float()
    acc = correct_pred.sum() / len(correct_pred)    acc = torch.round(acc) * 100    return acc

We’ll also define 2 dictionaries which will store the accuracy/epoch and loss/epoch for both train and validation sets.

accuracy_stats = {
    'train': [],
    "val": []
}loss_stats = {
    'train': [],
    "val": []
}

Let’s TRAIN our model!

You can see we’ve put a model.train() at the before the loop. model.train() tells PyTorch that you're in training mode. Well, why do we need to do that? If you're using layers such as Dropout or BatchNorm which behave differently during training and evaluation (for example; not use dropout during evaluation), you need to tell PyTorch to act accordingly. While the default mode in PyTorch is the train, so, you don't explicitly have to write that. But it's good practice.

Similarly, we’ll call model.eval() when we test our model. We'll see that below. Back to training; we start a for-loop. At the top of this for-loop, we initialize our loss and accuracy per epoch to 0. After every epoch, we'll print out the loss/accuracy and reset it back to 0.

Then we have another for-loop. This for-loop is used to get our data in batches from the train_loader .

We do optimizer.zero_grad() before we make any predictions. Since the .backward() function accumulates gradients, we need to set it to 0 manually per mini-batch. From our defined model, we then obtain a prediction, get the loss(and accuracy) for that mini-batch, perform back-propagation using loss.backward() and optimizer.step().

Finally, we add all the mini-batch losses (and accuracies) to obtain the average loss (and accuracy) for that epoch. We add up all the losses/accuracies for each mini-batch and finally divide it by the number of mini-batches ie. length of train_loader to obtain the average loss/accuracy per epoch.

The procedure we follow for training is the exact same for validation except for the fact that we wrap it up in torch.no_grad and not perform any back-propagation. torch.no_grad() tells PyTorch that we do not want to perform back-propagation, which reduces memory usage and speeds up computation.

print("Begin training.")for e in tqdm(range(1, 11)):    # TRAINING    train_epoch_loss = 0
    train_epoch_acc = 0    model.train()
    for X_train_batch, y_train_batch in train_loader:
        X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device)        optimizer.zero_grad()        y_train_pred = model(X_train_batch).squeeze()        train_loss = criterion(y_train_pred, y_train_batch)
        train_acc = multi_acc(y_train_pred, y_train_batch)        train_loss.backward()
        optimizer.step()        train_epoch_loss += train_loss.item()
        train_epoch_acc += train_acc.item()
    # VALIDATION
    with torch.no_grad():
        model.eval()
        val_epoch_loss = 0
        val_epoch_acc = 0
        for X_val_batch, y_val_batch in val_loader:
            X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)            y_val_pred = model(X_val_batch).squeeze()            y_val_pred = torch.unsqueeze(y_val_pred, 0)            val_loss = criterion(y_val_pred, y_val_batch)
            val_acc = multi_acc(y_val_pred, y_val_batch)            val_epoch_loss += train_loss.item()
            val_epoch_acc += train_acc.item()    loss_stats['train'].append(train_epoch_loss/len(train_loader))
    loss_stats['val'].append(val_epoch_loss/len(val_loader))
    accuracy_stats['train'].append(train_epoch_acc/len(train_loader))
    accuracy_stats['val'].append(val_epoch_acc/len(val_loader))
    print(f'Epoch {e+0:02}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(val_loader):.5f} | Train Acc: {train_epoch_acc/len(train_loader):.3f}| Val Acc: {val_epoch_acc/len(val_loader):.3f}')###################### OUTPUT ######################Begin training.Epoch 01: | Train Loss: 33.38733 | Val Loss: 10.19880 | Train Acc: 91.667| Val Acc: 100.000Epoch 02: | Train Loss: 6.49906 | Val Loss: 41.86950 | Train Acc: 99.603| Val Acc: 100.000Epoch 03: | Train Loss: 3.15175 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 04: | Train Loss: 0.40076 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 05: | Train Loss: 5.56540 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 06: | Train Loss: 1.56760 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 07: | Train Loss: 1.21176 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 08: | Train Loss: 0.84762 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 09: | Train Loss: 0.35811 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000Epoch 10: | Train Loss: 0.01389 | Val Loss: 0.00000 | Train Acc: 100.000| Val Acc: 100.000

Visualize Loss and Accuracy

To plot the loss and accuracy line plots, we again create a dataframe from the accuracy_stats and loss_stats dictionaries.

train_val_acc_df = pd.DataFrame.from_dict(accuracy_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})
# Plot line charts
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(30,10))sns.lineplot(data=train_val_acc_df, x = "epochs", y="value", hue="variable",  ax=axes[0]).set_title('Train-Val Accuracy/Epoch')sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable", ax=axes[1]).set_title('Train-Val Loss/Epoch')

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

PyTorch [Vision] — Multiclass Image Classification

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

AI极简经济学

阿杰伊·阿格拉沃尔、乔舒亚·甘斯、阿维·戈德法布 / 闾佳 / 湖南科技出版社 / 2018-12-1 / 58.00

人工智能正在以不可阻挡的态势席卷全球。无论是iPhone的神经网络引擎、AlphaGo的围棋算法，还是无人驾驶、深度学习……毫无疑问，人工智能正在改写行业形态。如同此前个人电脑、互联网、大数据的风行一般，技术创新又一次极大地改变了我们的工作与生活。那么，究竟应该如何看待人工智能？在《AI极简经济学》一书中，三位深耕人工智能和决策领域的经济学家给出了清晰的答案。他们以坚实的经济学理论剖析动态，把握......一起来看看《AI极简经济学》这本书的介绍吧!

码农工具

PyTorch [Vision] — Multiclass Image Classification

How to train your neural net

PyTorch[Vision] — Multiclass Image Classification

Import Libraries

Define Paths and Set GPU

Define transforms

Define transforms

Initialize Datasets

Train + Validation Dataset

Class <=> ID Mapping of Output

Get Train and Validation Samples

Test

Train, Validation, and Test Dataloader

Explore The Data

Define a CNN Architecture

Visualize Loss and Accuracy

AI极简经济学

HTML 压缩/解压工具

在线进制转换器

RGB HSV 转换