A visualization of the basic elements of a Convolutional Neural Network
May 26 ·8min read
Visualization is a great tool in understanding rich concepts, especially for beginners in the area. In this article, we will go through the basic elements of a convolutional neural network using visual aids. The article begins with providing a template (visually) for a basic CNN with different building blocks and then discusses the most commonly used elements for each of the building blocks.
Basic CNN Template:
A basic CNN consists of three kinds of layers. Input, hidden, and output as shown below. The data gets into the CNN through the input layer and passes through various hidden layers before getting to the output layer. The output layer is the prediction of the network. The output of the network is compared to the actual labels in terms of loss or error. For the network to learn, the partial derivatives of this loss w.r.t the trainable weights are calculated and the weights are updated through one of the various methods using backpropagation.
The complete visual template for a basic CNN can be seen below.
Hidden Layers of CNN
The hidden layers in the network provide a basic building block to transform the data (input layer or the output of the previously hidden layer). Most of the commonly used hidden layers (not all) follow a pattern. It begins with applying a function to its input, moving onto pooling, normalization, and finally applying the activation before it can be fed as input to the next layer. Thus, each layer can be decomposed into the following 4 sub-functions
- Layer function: Basic transforming function such as convolutional or fully connected layer.
- Pooling: Used to change the spatial size of the feature map either increasing (up-sampling) or decreasing (most common) it. For example maxpooling, average pooling, and unpooling.
- Normalization: This subfunction normalizes the data to have zero mean and unit variance. This helps in coping up with problems such as vanishing gradient, internal covariate shift, etc.(more information). The two most common normalization techniques used are local response normalization and batch normalization.
- Activation: Applies non-linearity and bounds the output from getting too high or too low.
We will go through each of the sub-functions explaining their most common examples.
There are much more complex CNN architectures out there which have various other layers and rather complex arcvitecture. Not all the CNN architectures follow this template.
1. Layer functions
The most commonly used layer functions are the fully connected, convolutional, and transposed convolutional (wrongfully known as deconvolutional) layers.
a. Fully Connected Layers:
These layers consist of linear functions between the input and the output. For i input nodes and j output nodes, the trainable weights are wij and bj. The figure on the left illustrates how a fully connected layer between 3 input and 2 output nodes work.
b. Convolutional Layers:
These layers are applied to 2D (and 3D) input feature maps. The trainable weights are a 2D (or 3D) kernel/filter that moves across the input feature map, generating dot products with the overlapping region of the input feature map. Following are the 3 parameters used to define a convolutional layer
- Kernel Size K: The size of the sliding kernel or filter.
- Stride Length S: Defines how much is the kernel slid before the dot product is carried out to generate the output pixel
- Padding P: The frame size of zeros inserted around the input feature map.
The 4 figures below visually explain the convolutional layer on an input of size ( i ) 5x5 for a kernel size ( k ) of 3x3 and varying strides ( s ) and padding ( p )
The stride and padding along with the input feature map control the size of the output feature map. The output size is given by
c. Transposed Convolutional (DeConvolutional) Layer:
Usually used to increase the size of the output feature map (Upsampling). The idea behind the transposed convolutional layer is to undo (not exactly) the convolutional layer. Just as the convolutional layer, it is also defined by the stride length and the padding. If we apply the provided stride and padding on the output and apply the convolutional kernel of the provided size, it will generate the input.
In order to generate the output, two things are carried out
- zero insertion ( z ): The number of zeros inserted between rows and cols of the original input
- padding ( p’ ): The frame size of zeros inserted around the input feature map.
The 4 figures below visually explain the transposed convolutional layer on an input of varying size ( i ), for a kernel size ( k ) of 3x3 and varying strides ( s ) and padding ( p ) while the output (o) is fixed to 5x5
In-depth details on transposed convolutional layers can be found here
2. Pooling
The most commonly used poolings are Max, average pooling, and max average unpooling.
Max/Average Pooling:
A non-trainable layer used to decrease the spatial size of the input layer based on selecting the maximum/average value in a receptive field defined by the kernel. A kernel is slid across the input feature map with a given stride. For each position, the maximum/average value of the part of the input feature map overlapping the kernel is the corresponding output pixel.
UnPooling:
A non-trainable layer used to increase the spatial size of the input layer based on placing the input pixel at a certain index in the receptive field of the output defined by the kernel. For an unpooling layer, there needs to be a corresponding pooling layer earlier in the network. The index of maximum/average value from the corresponding pooling layer is saved and used in the unpooling layer. In the unpooling layer, each input pixel is placed in the output at the index where the maximum/average occurred in the pooling layer while the other pixels are set to zero
3. Normalization
Normalization is usually used just before the activation functions to limit the unbounded activation from increasing the output layer values too high. There are two types of normalization techniques usually used
a. Local Response Normalization LRN:
LRN is a non-trainable layer that square-normalizes the pixel values in a feature map within a local neighborhood. There are two types of LRN based on the neighborhood defined Inter-channel and Intra-channel and can be seen in the figure below.
b. Batch Normalization BN:
BN, on the other hand, is a trainable approach to normalizing the data. In batch normalization, the output of hidden neurons is processed in the following manner before being fed to the activation function.
- Normalize the entire batch B to be zero mean and unit variance
- Calculate the mean of the entire mini-batch output: u_B
- Calculate the variance of the entire mini-batch output: s igma_B
- Normalize the mini-batch by subtracting the mean and dividing with variance
2. Introduce two trainable parameters ( Gamma: scale_variable and Beta: shift_variable) to scale and shift the normalized mini-batch output
3. Feed this scaled and shifted normalized mini-batch to the activation function.
A summary of the two normalization techniques can be seen below
A detailed article on these normalization techniques can be found here
4. Activation
The main purpose of activation functions is to introduce non-linearity so CNN is able to efficiently map non-linear complex mapping between the input and output. Multiple activation functions are available and used based on the underlying requirements.
- Non-parametric/Static functions: Linear, ReLU
- Parametric functions: ELU, tanh, sigmoid, Leaky ReLU
- Bounded functions: tanh, sigmoid
The gif below visually explains the nature of the most commonly used activation functions.
The most commonly used activation function is ReLU. Bounded activation functions such as tanh and sigmoid suffer from the problem of vanishing gradient when it comes to deeper neural networks, and is normally avoided.
5. Loss Calculation:
Once you have defined your CNN, a loss function needs to be picked that quantifies how far off the CNN prediction is from the actual labels. This loss is then used in the gradient descent method to train the network variables. Like the activation functions, there are multiple candidates available for loss functions.
Regression Loss Functions
- Mean Absolute Error: The estimated value and labels are real numbers
- Mean Square Error: The estimated value and labels are real numbers
- Huber Loss: The estimated value and labels are real numbers
Classification Loss Functions
- Cross-Entropy: The estimated value and labels are probability (0,1)
- Hinge Loss: The estimated value and labels are real numbers
The details on these loss functions can be seen in the plot below
6. Backpropagation
Backpropagation is not a structural element of the CNN, rather its the methodology through which we learn the underlying problem by updating the weights in the opposite direction of the change in gradient (gradient descent). In-depth detail on different gradient descent algorithms can be found here .
Summary:
In this article, animated visualizations of different elements of a basic CNN have been presented which will help understand their functions better.
以上所述就是小编给大家介绍的《A visualization of the basic elements of a Convolutional Neural Network》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Two Scoops of Django
Daniel Greenfeld、Audrey M. Roy / CreateSpace Independent Publishing Platform / 2013-4-16 / USD 29.95
Two Scoops of Django: Best Practices For Django 1.5 is chock-full of material that will help you with your Django projects. We'll introduce you to various tips, tricks, patterns, code snippets, and......一起来看看 《Two Scoops of Django》 这本书的介绍吧!