Conv2d: Finally Understand What Happens in the Forward Pass

栏目: IT技术 · 发布时间: 4年前

内容简介:Deep Learning’sIn this post, I will try to list all these arguments. This post is for you if you want to see their impact on theAll GIFs are made with python. You will be able to test each of these arguments and

A visual and mathematical explanation of the 2D convolution layer and its arguments

May 2 ·9min read

Introduction

Deep Learning’s libraries and platforms such as Tensorflow , Keras , Pytorch , Caffe or Theano help us in our daily lives so that every day new applications make us think “Wow!”. We all have our favorite framework, but what they all have in common is that they make things easy for us with functions that are easy to use that can be configured as needed. But we still need to understand what the arguments available are to take advantage of all the power these frameworks give us.

In this post, I will try to list all these arguments. This post is for you if you want to see their impact on the computation time , the number of trainable parameters and the size of the convolved output channels.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape : (3, 7, 7) — Output Shape : (2, 3, 3) — K : (3, 3) — P : (1, 1) — S : (2, 2) — D : (2, 2) — G : 1

All GIFs are made with python. You will be able to test each of these arguments and visualize by yourself their impact with the scripts pushed on my Github (or to make your own GIFs).

The parts of this post will be divided according to the following arguments. These arguments can be found in the Pytorch documentation of the Conv2d module :

  • in_channels ( int ) — Number of channels in the input image
  • out_channels ( int ) — Number of channels produced by the convolution
  • kernel_size ( int or tuple ) — Size of the convolving kernel
  • stride ( int or tuple , optional ) — Stride of the convolution. Default: 1
  • padding ( int or tuple , optional ) — Zero-padding added to both sides of the input. Default: 0
  • dilation ( int or tuple , optional ) — Spacing between kernel elements. Default: 1
  • groups ( int , optional ) — Number of blocked connections from input channels to output channels. Default: 1
  • bias ( bool , optional ) — If True , adds a learnable bias to the output. Default: True

Finally, we will have all the keys to calculate the size of the output channels according to the arguments and the size of the input channels.

What is a Kernel?

Conv2d: Finally Understand What Happens in the Forward Pass

Convolution between an input image and a kernel

Let me introduce what a kernel is (or convolution matrix ). A kernel describes a filter that we are going to pass over an input image. To make it simple, the kernel will move over the whole image, from left to right, from top to bottom by applying a convolution product . The output of this operation is called a filtered image .

Convolution product

Conv2d: Finally Understand What Happens in the Forward Pass

Input shape : (1, 9, 9) — Output Shape : (1, 7, 7) — K : (3, 3) — P : (0, 0) — S : (1, 1) — D : (1, 1) — G : 1

To take a very basic example, let’s imagine a 3 by 3 convolution kernel filtering a 9 by 9 image. Then this kernel moves all over the image to capture in the image all squares of the same size (3 by 3). The convolution product is an element-wise (or point-wise) multiplication. The sum of this result is the resulting pixel on the output (or filtered) image.

If you are not already familiar with filters and convolution matrices, then I strongly advise you to take a little more time to understand the convolution kernels. They are the core of the 2D convolution layer .

Trainable Parameters and Bias

The trainable parameters , which are also simply called “parameters”, are all the parameters that will be updated when the network is trained. In a Conv2d, the trainable elements are the values that compose the kernels . So for our 3 by 3 convolution kernel, we have 3*3=9 trainable parameters.

Conv2d: Finally Understand What Happens in the Forward Pass

To be more complete. We can include bias or not. The role of bias is to be added to the sum of the convolution product. This bias is also a trainable parameter which makes the number of trainable parameters for our 3 by 3 kernel rise to 10.

Number of Input and Output Channels

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (1, 7, 7) — Output Shape : (4, 5, 5) — K : (3, 3) — P : (0, 0) — S : (1, 1) — D : (1, 1) — G : 1

The benefit of using a layer is to be able to perform similar operations at the same time. In other words, if we want to apply 4 different filters of the same size to an input channel, then we will have 4 output channels. These channels are the result of 4 different filters. So resulting from 4 distinct kernels .

In the previous section, we saw that the trainable parameters are what make up the convolution kernels. So the number of parameters increases linearly with the number of convolution kernels. Hence linearly with the number of desired output channels. Note also that the computing time also varies proportionally with the size of the input channel and proportionally with the number of kernels.

Conv2d: Finally Understand What Happens in the Forward Pass

Note that the curves in the Parameters graph are the same

The same principle applies to the number of input channels . Let’s consider the situation of an RGB encoded image. This image has 3 channels: red, blue and green. We can decide to extract information with filters of the same size on each of these 3 channels to obtain four new channels. The operation is thus 3 times the same, for 4 output channels.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (3, 7, 7) — Output Shape : (4, 5, 5) — K : (3, 3) — P : (0, 0) — S : (1, 1) — D : (1, 1) — G : 1

Each output channel is the sum of the filtered input channels.For 4 output channels and 3 input channels, each output channel is the sum of 3 filtered input channels. In other words, the convolution layer is composed of 4*3=12 convolution kernels.

As a reminder, the number of parameters and the computation time changes proportionally to the number of output channels. This is due to the fact that each output channel is linked to kernels distinct from the other channels. The same is true for the number of input channels . The calculation time and the number of parameters grows proportionally.

Conv2d: Finally Understand What Happens in the Forward Pass

Kernel size

So far, all examples have been given with 3 by 3 size kernels. In fact, the choice of its size is entirely up to you . It is possible to create a convolution layer with a core size 1*1 or 19*19.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (3, 7, 9) — Output Shape : (2, 3, 9) — K : (5, 2) — P : (0, 0) — S : (1, 1) — D : (1, 1) — G : 1

But it is also absolutely possible not to have square kernels. It is a possibility to decide to have kernels with different heights and widths . This is often the case in signal image analysis. If we know that we want to scan the image of a signal, of a sound, then we may want to prefer a 5*1 size kernel for example.

Finally, you will have noticed that all sizes are defined by an odd number. It is just as acceptable to define an even kernel size. In practice, this is rarely done. Usually, an odd size kernel is chosen because there is symmetry around a central pixel.

Conv2d: Finally Understand What Happens in the Forward Pass

Since all the (classical) trainable parameters of a convolution layer are in the kernels, the number of parameters grows linearly with the size of the kernels. The computation time also varies proportionally.

Strides

The kernels, by default, move from left to right, from bottom to top from pixel to pixel. But this movement can also be changed. Often used to down sample the output channel. For example with strides of (1, 3), the filter is shifted from 3 to 3 horizontally and from 1 to 1 vertically. This produces output channels down sampled by 3 horizontally.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (3, 9, 9) — Output Shape : (2, 7, 3) — K : (3, 3) — P : (0, 0) — S : (1, 3) — D : (1, 1) — G : 1

The strides have no impact on the number of parameters but the calculation time, logically, decreases linearly with the strides.

Conv2d: Finally Understand What Happens in the Forward Pass

Note that the curves in the Parameters graph are the same

Padding

The padding defines the number of pixels added to the sides of the input channels before their convolution filtering. Usually, the padding pixels are set to zero. The input channel is extended .

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape : (2, 7, 7) — Output Shape : (1, 7, 7) — K : (3, 3) — P : (1, 1) — S : (1, 1) — D : (1, 1) — G : 1

This is very useful when you want the size of the output channels to be equal to the size of the input channels. To make it simple, when the kernel is 3*3 then the output channel size decreases by one on each side. To overcome this problem we can use a padding of 1.

The padding, therefore, has no impact on the number of parameters, but generates an additional calculation time proportional to the size of the padding. But generally speaking, the padding is often small enough in comparison to the size of the input channel to consider there is no impact on the computation time.

Conv2d: Finally Understand What Happens in the Forward Pass

Note that the curves in the Parameters graph are the same

Dilation

The dilation is, in a way, the width of the core. By default equal to 1, it corresponds to the offset between each pixel of the kernel on the input channel during convolution .

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (2, 7, 7) — Output Shape : (1, 1, 5) — K : (3, 3) — P : (1, 1) — S : (1, 1) — D : (4, 2) — G : 1

I exaggerated a bit on my GIF, but if we take the example of a dilation of (4, 2) then the receptive field of the kernel on the input channel is widened by 4 * ( 3 -1)=8 vertically and 2 * (3–1)=4 horizontally (for a kernel of 3 by 3).

Just like padding, dilation has no impact on the number of parameters and very limited impact on the calculation time.

Conv2d: Finally Understand What Happens in the Forward Pass

Note that the curves in the Parameters graph are the same

Groups

Groups can be very useful in specific cases. For example, if we have several concatenated data sources. When it is not necessary to treat them dependent on each other. Input channels can be grouped and processed independently . Finally, the output channels are concatenated at the end.

If there are 2 input channels and 4 output channels with 2 groups. Then this is like dividing the input channels into two groups (so 1 input channel in each group) and making it go through a convolution layer with half as many output channels. The output channels are then concatenated.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape : (2, 7, 7) — Output Shape : (4, 5, 5) — K : (3, 3) — P : (2, 2) — S : (2, 2) — D : (1, 1) — G : 2

It is important to note two things. Firstly, the number of groups must perfectly divide the number of input channels and the number of output channels ( common divisor ). Secondly, the kernels are shared with each group.

The number of parameters is therefore divided by the number of groups. Concerning the computation time with Pytorch, the algorithm is optimized for groups and therefore should reduce the computation time. However, it should also be taken into account that must sum up with the calculation time for group formation and concatenation of the output channels.

Conv2d: Finally Understand What Happens in the Forward Pass

Output Channel Size

With the knowledge of all the arguments, the size of the output channels can be calculated from the size of the input channels.

Conv2d: Finally Understand What Happens in the Forward Pass

Sources

Deep Learning Tutorial , Y. LeCun

Documentation torch.nn , Pytorch

Convolutional Neural Networks , cs231n

Convolutional Layers , Keras

All the images are home made

All computation time tests have been run with Pytorch, on my GPU (GeForce GTX 960M) and are available on this GitHub repository if you want to run them yourself or perform alternative tests.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

天涯虚拟社区

天涯虚拟社区

刘华芹 / 民族出版社 / 2005-11 / 23.00元

网络空间很复杂,好多人并不完全了解或者只是了解到一些皮毛。比如说好多人对于见网友一事总是抱着浪漫或者暖昧的想法,而事实却并不总是想象的那样。作者在做虚拟社区研究甚至是在有这个想法之前并不常呆在网上,互联网对于作者来说就是查查资料、收发信年、看看新闻的工具。担是看着越来越多的人把时间花在网上,一处文化上的直觉告诉作者:有一种新的生活方式产生了。强烈的好奇心驱使着作者走到了网上,走到了天涯虚拟社区,并......一起来看看 《天涯虚拟社区》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试