A visualization of the basic elements of a Convolutional Neural Network

栏目: IT技术 · 发布时间: 4年前

A visualization of the basic elements of a Convolutional Neural Network

May 26 ·8min read

Visualization is a great tool in understanding rich concepts, especially for beginners in the area. In this article, we will go through the basic elements of a convolutional neural network using visual aids. The article begins with providing a template (visually) for a basic CNN with different building blocks and then discusses the most commonly used elements for each of the building blocks.

Basic CNN Template:

A basic CNN consists of three kinds of layers. Input, hidden, and output as shown below. The data gets into the CNN through the input layer and passes through various hidden layers before getting to the output layer. The output layer is the prediction of the network. The output of the network is compared to the actual labels in terms of loss or error. For the network to learn, the partial derivatives of this loss w.r.t the trainable weights are calculated and the weights are updated through one of the various methods using backpropagation.

The complete visual template for a basic CNN can be seen below.

A visualization of the basic elements of a Convolutional Neural Network — Template for a basic CNN

Hidden Layers of CNN

The hidden layers in the network provide a basic building block to transform the data (input layer or the output of the previously hidden layer). Most of the commonly used hidden layers (not all) follow a pattern. It begins with applying a function to its input, moving onto pooling, normalization, and finally applying the activation before it can be fed as input to the next layer. Thus, each layer can be decomposed into the following 4 sub-functions

Layer function: Basic transforming function such as convolutional or fully connected layer.
Pooling: Used to change the spatial size of the feature map either increasing (up-sampling) or decreasing (most common) it. For example maxpooling, average pooling, and unpooling.
Normalization: This subfunction normalizes the data to have zero mean and unit variance. This helps in coping up with problems such as vanishing gradient, internal covariate shift, etc.(more information). The two most common normalization techniques used are local response normalization and batch normalization.
Activation: Applies non-linearity and bounds the output from getting too high or too low.

We will go through each of the sub-functions explaining their most common examples.

There are much more complex CNN architectures out there which have various other layers and rather complex arcvitecture. Not all the CNN architectures follow this template.

1. Layer functions

The most commonly used layer functions are the fully connected, convolutional, and transposed convolutional (wrongfully known as deconvolutional) layers.

a. Fully Connected Layers:

These layers consist of linear functions between the input and the output. For i input nodes and j output nodes, the trainable weights are wij and bj. The figure on the left illustrates how a fully connected layer between 3 input and 2 output nodes work.

b. Convolutional Layers:

These layers are applied to 2D (and 3D) input feature maps. The trainable weights are a 2D (or 3D) kernel/filter that moves across the input feature map, generating dot products with the overlapping region of the input feature map. Following are the 3 parameters used to define a convolutional layer

Kernel Size K: The size of the sliding kernel or filter.
Stride Length S: Defines how much is the kernel slid before the dot product is carried out to generate the output pixel
Padding P: The frame size of zeros inserted around the input feature map.

The 4 figures below visually explain the convolutional layer on an input of size ( i ) 5x5 for a kernel size ( k ) of 3x3 and varying strides ( s ) and padding ( p )

The stride and padding along with the input feature map control the size of the output feature map. The output size is given by

c. Transposed Convolutional (DeConvolutional) Layer:

Usually used to increase the size of the output feature map (Upsampling). The idea behind the transposed convolutional layer is to undo (not exactly) the convolutional layer. Just as the convolutional layer, it is also defined by the stride length and the padding. If we apply the provided stride and padding on the output and apply the convolutional kernel of the provided size, it will generate the input.

In order to generate the output, two things are carried out

zero insertion ( z ): The number of zeros inserted between rows and cols of the original input
padding ( p’ ): The frame size of zeros inserted around the input feature map.

The 4 figures below visually explain the transposed convolutional layer on an input of varying size ( i ), for a kernel size ( k ) of 3x3 and varying strides ( s ) and padding ( p ) while the output (o) is fixed to 5x5

In-depth details on transposed convolutional layers can be found here

2. Pooling

The most commonly used poolings are Max, average pooling, and max average unpooling.

Max/Average Pooling:

A non-trainable layer used to decrease the spatial size of the input layer based on selecting the maximum/average value in a receptive field defined by the kernel. A kernel is slid across the input feature map with a given stride. For each position, the maximum/average value of the part of the input feature map overlapping the kernel is the corresponding output pixel.

UnPooling:

A non-trainable layer used to increase the spatial size of the input layer based on placing the input pixel at a certain index in the receptive field of the output defined by the kernel. For an unpooling layer, there needs to be a corresponding pooling layer earlier in the network. The index of maximum/average value from the corresponding pooling layer is saved and used in the unpooling layer. In the unpooling layer, each input pixel is placed in the output at the index where the maximum/average occurred in the pooling layer while the other pixels are set to zero

3. Normalization

Normalization is usually used just before the activation functions to limit the unbounded activation from increasing the output layer values too high. There are two types of normalization techniques usually used

a. Local Response Normalization LRN:

LRN is a non-trainable layer that square-normalizes the pixel values in a feature map within a local neighborhood. There are two types of LRN based on the neighborhood defined Inter-channel and Intra-channel and can be seen in the figure below.

b. Batch Normalization BN:

BN, on the other hand, is a trainable approach to normalizing the data. In batch normalization, the output of hidden neurons is processed in the following manner before being fed to the activation function.

Normalize the entire batch B to be zero mean and unit variance

Calculate the mean of the entire mini-batch output: u_B
Calculate the variance of the entire mini-batch output: s igma_B
Normalize the mini-batch by subtracting the mean and dividing with variance

2. Introduce two trainable parameters ( Gamma: scale_variable and Beta: shift_variable) to scale and shift the normalized mini-batch output

3. Feed this scaled and shifted normalized mini-batch to the activation function.

A summary of the two normalization techniques can be seen below

A detailed article on these normalization techniques can be found here

4. Activation

The main purpose of activation functions is to introduce non-linearity so CNN is able to efficiently map non-linear complex mapping between the input and output. Multiple activation functions are available and used based on the underlying requirements.

Non-parametric/Static functions: Linear, ReLU
Parametric functions: ELU, tanh, sigmoid, Leaky ReLU
Bounded functions: tanh, sigmoid

The gif below visually explains the nature of the most commonly used activation functions.

The most commonly used activation function is ReLU. Bounded activation functions such as tanh and sigmoid suffer from the problem of vanishing gradient when it comes to deeper neural networks, and is normally avoided.

5. Loss Calculation:

Once you have defined your CNN, a loss function needs to be picked that quantifies how far off the CNN prediction is from the actual labels. This loss is then used in the gradient descent method to train the network variables. Like the activation functions, there are multiple candidates available for loss functions.

Regression Loss Functions

Mean Absolute Error: The estimated value and labels are real numbers
Mean Square Error: The estimated value and labels are real numbers
Huber Loss: The estimated value and labels are real numbers

Classification Loss Functions

Cross-Entropy: The estimated value and labels are probability (0,1)
Hinge Loss: The estimated value and labels are real numbers

The details on these loss functions can be seen in the plot below

6. Backpropagation

Backpropagation is not a structural element of the CNN, rather its the methodology through which we learn the underlying problem by updating the weights in the opposite direction of the change in gradient (gradient descent). In-depth detail on different gradient descent algorithms can be found here .

Summary:

In this article, animated visualizations of different elements of a basic CNN have been presented which will help understand their functions better.

以上所述就是小编给大家介绍的《A visualization of the basic elements of a Convolutional Neural Network》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

A visualization of the basic elements of a Convolutional Neural Network

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

群体智能

James Kennedy、Russell C Eberhart、Yuhui Shi / 人民邮电出版社 / 2009-2-1 / 75.00元

群体智能是近年来发展迅速的人工智能学科领域.通过研究分散,自组织的动物群体和人类社会的智能行为, 学者们提出了许多迥异于传统思路的智能算法, 很好地解决了不少原来非常棘手的复杂工程问题.与蚁群算法齐名的粒子群优化(particle swarm optimization, 简称PSO)算法就是其中最受瞩目,应用最为广泛的成果之一. 本书由粒子群优化算法之父撰写,是该领域毋庸置疑的经典著作.作者......一起来看看《群体智能》这本书的介绍吧!

码农工具

A visualization of the basic elements of a Convolutional Neural Network