Understanding and Implementing LeNet-5 CNN Architecture (Deep Learning)

栏目: IT技术 · 发布时间: 4年前

Technical and Code

Understanding and Implementing LeNet-5 CNN Architecture (Deep Learning)

In this article, we perform image classification on the MNIST dataset with custom implemented LeNet-5 neural network architecture.

Richmond Alake

Jun 25 ·8min read

Introduction

L eNet was introduced in the research paper “ Gradient-Based Learning Applied To Document Recognition ” in the year 1998 by Yann LeCun , Leon Bottou , Yoshua Bengio , and Patrick Haffner . Many of the listed authors of the paper have gone onto providing significant academic contributions to the field of deep learning.

Understanding and Implementing LeNet-5 CNN Architecture (Deep Learning) — **Yann LeCun** , **Leon Bottou** , **Patrick Haffner** **, and** **Yoshua Bengio**

This article will introduce the LeNet-5 CNN architecture as described in the original paper, along with the implementation of the architecture using TensorFlow 2.0.

This article will then conclude with the utilization of the implemented LeNet-5 CNN for the classification of images from the MNIST dataset.

What to find in this article:

Understanding of components within a convolutional neural network
Key definitions of terms commonly used in deep learning and machine learning
Understanding of LeNet-5 as presented in the original research paper
Implementation of a neural network using TensorFlow and Keras

The content in this article is written for Deep learning and Machine Learning students of all levels.

For those who are eager to get coding, scroll down to the ‘ LeNet-5 TensorFlow Implementation’ section.

Convolutional Neural Networks

Convolutional Neural Networks is the standard form of neural network architecture for solving tasks associated with images. Solutions for tasks such as object detection, face detection, pose estimation and more all have CNN architecture variants.

A few characteristics of the CNN architecture makes them more favourable in several computer vision tasks. I have written previous articles that dive into each characteristic.

LeNet-5

LeNet-5 CNN architecture is made up of 7 layers. The layer composition consists of 3 convolutional layers, 2 subsampling layers and 2 fully connected layers.

The diagram above shows a depiction of the LeNet-5 architecture, as illustrated in the original paper .

The first layer is the input layer — this is generally not considered a layer of the network as nothing is learnt in this layer. The input layer is built to take in 32x32, and these are the dimensions of images that are passed into the next layer. Those who are familiar with the MNIST dataset will be aware that the MNIST dataset images have the dimensions 28x28. To get the MNIST images dimension to the meet the requirements of the input layer, the 28x28 images are padded.

The grayscale images used in the research paper had their pixel values normalized from 0 to 255, to values between -0.1 and 1.175. The reason for normalization is to ensure that the batch of images have a mean of 0 and a standard deviation of 1, the benefits of this is seen in the reduction in the amount of training time. In the image classification with LeNet-5 example below, we’ll be normalizing the pixel values of the images to take on values between 0 to 1.

The LeNet-5 architecture utilizes two significant types of layer construct: convolutional layers and subsampling layers.

Within the research paper and the image below, convolutional layers are identified with the ‘ Cx’ , and subsampling layers are identified with ‘ Sx’ , where ‘ x’ is the sequential position of the layer within the architecture. ‘ Fx’ is used to identify fully connected layers. This method of layer identification can be seen in the image above.

The official first layer convolutional layer C1 produces as output 6 feature maps, and has a kernel size of 5x5. The kernel/filter is the name given to the window that contains the weight values that are utilized during the convolution of the weight values with the input values. 5x5 is also indicative of the local receptive field size each unit or neuron within a convolutional layer. The dimensions of the six feature maps the first convolution layer produces are 28x28.

A subsampling layer ‘ S2’ follows the ‘ C1’ layer’. The ‘ S2’ layer halves the dimension of the feature maps it receives from the previous layer; this is known commonly as downsampling.

The ‘ S2’ layer also produces 6 feature maps, each one corresponding to the feature maps passed as input from the previous layer. Thislinkcontains more information on subsampling layers.

More information on the rest of the LeNet-5 layers is covered in the implementation section.

Below is a table that summarises the key features of each layer:

LeNet-5 Architecture features by Author

LeNet-5 TensorFlow Implementation

We begin implementation by importing the libraries we will be utilizing:

TensorFlow : An open-source platform for the implementation, training, and deployment of machine learning models.
Keras : An open-source library used for the implementation of neural network architectures that run on both CPUs and GPUs.
Numpy : A library for numerical computation with n-dimensional arrays.

import tensorflow as tf
from tensorflow import keras
import numpy as np

Next, we load the MNIST dataset using the Keras library. The Keras library has a suite of datasets readily available for use with easy accessibility.

We are also required to partition the dataset into testing, validation and training. Here are some quick descriptions of each partition category.

Training Dataset : This is the group of our dataset used to train the neural network directly. Training data refers to the dataset partition exposed to the neural network during training.
Validation Dataset : This group of the dataset is utilized during training to assess the performance of the network at various iterations.
Test Dataset : This partition of the dataset evaluates the performance of our network after the completion of the training phase.

It is also required that the pixel intensity of the images within the dataset are normalized from the value range 0–255 to 0–1.

(train_x, train_y), (test_x, test_y) = keras.datasets.mnist.load_data()
train_x = train_x / 255.0
test_x = test_x / 255.0train_x = tf.expand_dims(train_x, 3)
test_x = tf.expand_dims(test_x, 3)val_x = train_x[:5000]
val_y = train_y[:5000]

In the code snippet above, we expand the dimensions of the training and dataset. The reason we do this is that during the training and evaluation phases, the network expects the images to be presented within batches; the extra dimension is representative of the numbers of images in a batch.

The code below is the main part where we implement the actual LeNet-5 based neural network.

Keras provides tools required to implement the classification model. Keras presents a Sequential API for stacking layers of the neural network on top of each other.

lenet_5_model = keras.models.Sequential([
    keras.layers.Conv2D(6, kernel_size=5, strides=1,  activation='tanh', input_shape=train_x[0].shape, padding='same'), #C1
    keras.layers.AveragePooling2D(), #S2
    keras.layers.Conv2D(16, kernel_size=5, strides=1, activation='tanh', padding='valid'), #C3
    keras.layers.AveragePooling2D(), #S4
    keras.layers.Flatten(), #Flatten
    keras.layers.Dense(120, activation='tanh'), #C5
    keras.layers.Dense(84, activation='tanh'), #F6
    keras.layers.Dense(10, activation='softmax') #Output layer
])

We first assign the variable’ lenet_5_model' to an instance of the tf.keras.Sequential class constructor.

Within the class constructor, we then proceed to define the layers within our model.

The C1 layer is defined by the line keras.layers.Conv2D(6, kernel_size=5, strides=1, activation='tanh', input_shape=train_x[0].shape, padding='same') . We are using the tf.keras.layers.Conv2D class to construct the convolutional layers within the network. We pass a couple of arguments which are described here .

Activation Function : A mathematical operation that transforms the result or signals of neurons into a normalized output. An activation function is a component of a neural network that introduces non-linearity within the network. The inclusion of the activation function enables the neural network to have greater representational power and solve complex functions.

The rest of the convolutional layers follow the same layer definition as C 1 with some different values entered for the arguments.

In the original paper where the LeNet-5 architecture was introduced, subsampling layers were utilized. Within the subsampling layer the average of the pixel values that fall within the 2x2 pooling window was taken, after that, the value is multiplied with a coefficient value. A bias is added to the final result, and all this is done before the values are passed through the activation function.

But in our implemented LeNet-5 neural network, we’re utilizing the tf.keras.layers.AveragePooling2D constructor. We don’ t pass any arguments into the constructor as some default values for the required arguments are initialized when the constructor is called. Remember that the pooling layer role within the network is to downsample the feature maps as they move through the network.

There are two more types of layers within the network, the flatten layer and the dense layers.

The flatten layer is created with the class constructor tf.keras.layers.Flatten .

The purpose of this layer is to transform its input to a 1-dimensional array that can be fed into the subsequent dense layers.

The dense layers have a specified number of units or neurons within each layer, F6 has 84, while the output layer has ten units.

The last dense layer has ten units that correspond to the number of classes that are within the MNIST dataset. The activation function for the output layer is a softmax activation function.

Softmax : An activation function that is utilized to derive the probability distribution of a set of numbers within an input vector. The output of a softmax activation function is a vector in which its set of values represents the probability of an occurrence of a class/event. The values within the vector all add up to 1.

Now we can compile and build the model.

lenet_5_model.compile(optimizer=’adam’, loss=keras.losses.sparse_categorical_crossentropy, metrics=[‘accuracy’])

Keras provides the ‘ compile’ method through the model object we have instantiated earlier. The compile function enables the actual building of the model we have implemented behind the scene with some additional characteristics such as the loss function, optimizer, and metrics.

To train the network, we utilize a loss function that calculates the difference between the predicted values provided by the network and actual values of the training data.

The loss values accompanied by an optimization algorithm( Adam ) facilitates the number of changes made to the weights within the network. Supporting factors such as momentum and learning rate schedule, provide the ideal environment to enable the network training to converge, herby getting the loss values as close to zero as possible.

During training, we’ll also validate our model after every epoch with the valuation dataset partition created earlier

lenet_5_model.fit(train_x, train_y, epochs=5, validation_data=(val_x, val_y))

After training, you will notice that your model achieves a validation accuracy of over 90%. But for a more explicit verification of the performance of the model on an unseen dataset, we will evaluate the trained model on the test dataset partition created earlier.

lenet_5_model.evaluate(test_x, test_y)
>> [0.04592850968674757, 0.9859]

After training my model, I was able to achieve 98% accuracy on the test dataset, which is quite useful for such a simple network.

Here’s GitHub link for the code presented in this article:

RichmondAlake/tensorflow_2_tutorials

Permalink Dismiss GitHub is home to over 50 million developers working together to host and review code, manage…

github.com.

I hope you found the article useful.

To connect with me or find more content similar to this article, do the following:

Subscribe to my YouTube channel for video contents coming soon here
Follow me on Medium
Connect and reach me on LinkedIn

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Understanding and Implementing LeNet-5 CNN Architecture (Deep Learning)

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Python核心编程（第3版）

[美] Wesley Chun / 孙波翔、李斌、李晗 / 人民邮电出版社 / 2016-5 / CNY 99.00

《Python核心编程（第3版）》是经典畅销图书《Python核心编程（第二版）》的全新升级版本，总共分为3部分。第1部分为讲解了Python的一些通用应用，包括正则表达式、网络编程、Internet客户端编程、多线程编程、GUI编程、数据库编程、Microsoft Office编程、扩展Python等内容。第2部分讲解了与Web开发相关的主题，包括Web客户端和服务器、CGI和WSGI相关的We......一起来看看《Python核心编程（第3版）》这本书的介绍吧!

码农工具

Understanding and Implementing LeNet-5 CNN Architecture (Deep Learning)

Technical and Code