Conv2d: Finally Understand What Happens in the Forward Pass

栏目: IT技术 · 发布时间: 4年前

内容简介:Deep Learning’sIn this post, I will try to list all these arguments. This post is for you if you want to see their impact on theAll GIFs are made with python. You will be able to test each of these arguments and

A visual and mathematical explanation of the 2D convolution layer and its arguments

May 2 ·9min read

Introduction

Deep Learning’s libraries and platforms such as Tensorflow , Keras , Pytorch , Caffe or Theano help us in our daily lives so that every day new applications make us think “Wow!”. We all have our favorite framework, but what they all have in common is that they make things easy for us with functions that are easy to use that can be configured as needed. But we still need to understand what the arguments available are to take advantage of all the power these frameworks give us.

In this post, I will try to list all these arguments. This post is for you if you want to see their impact on the computation time , the number of trainable parameters and the size of the convolved output channels.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape : (3, 7, 7) — Output Shape : (2, 3, 3) — K : (3, 3) — P : (1, 1) — S : (2, 2) — D : (2, 2) — G : 1

All GIFs are made with python. You will be able to test each of these arguments and visualize by yourself their impact with the scripts pushed on my Github (or to make your own GIFs).

The parts of this post will be divided according to the following arguments. These arguments can be found in the Pytorch documentation of the Conv2d module :

  • in_channels ( int ) — Number of channels in the input image
  • out_channels ( int ) — Number of channels produced by the convolution
  • kernel_size ( int or tuple ) — Size of the convolving kernel
  • stride ( int or tuple , optional ) — Stride of the convolution. Default: 1
  • padding ( int or tuple , optional ) — Zero-padding added to both sides of the input. Default: 0
  • dilation ( int or tuple , optional ) — Spacing between kernel elements. Default: 1
  • groups ( int , optional ) — Number of blocked connections from input channels to output channels. Default: 1
  • bias ( bool , optional ) — If True , adds a learnable bias to the output. Default: True

Finally, we will have all the keys to calculate the size of the output channels according to the arguments and the size of the input channels.

What is a Kernel?

Conv2d: Finally Understand What Happens in the Forward Pass

Convolution between an input image and a kernel

Let me introduce what a kernel is (or convolution matrix ). A kernel describes a filter that we are going to pass over an input image. To make it simple, the kernel will move over the whole image, from left to right, from top to bottom by applying a convolution product . The output of this operation is called a filtered image .

Convolution product

Conv2d: Finally Understand What Happens in the Forward Pass

Input shape : (1, 9, 9) — Output Shape : (1, 7, 7) — K : (3, 3) — P : (0, 0) — S : (1, 1) — D : (1, 1) — G : 1

To take a very basic example, let’s imagine a 3 by 3 convolution kernel filtering a 9 by 9 image. Then this kernel moves all over the image to capture in the image all squares of the same size (3 by 3). The convolution product is an element-wise (or point-wise) multiplication. The sum of this result is the resulting pixel on the output (or filtered) image.

If you are not already familiar with filters and convolution matrices, then I strongly advise you to take a little more time to understand the convolution kernels. They are the core of the 2D convolution layer .

Trainable Parameters and Bias

The trainable parameters , which are also simply called “parameters”, are all the parameters that will be updated when the network is trained. In a Conv2d, the trainable elements are the values that compose the kernels . So for our 3 by 3 convolution kernel, we have 3*3=9 trainable parameters.

Conv2d: Finally Understand What Happens in the Forward Pass

To be more complete. We can include bias or not. The role of bias is to be added to the sum of the convolution product. This bias is also a trainable parameter which makes the number of trainable parameters for our 3 by 3 kernel rise to 10.

Number of Input and Output Channels

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (1, 7, 7) — Output Shape : (4, 5, 5) — K : (3, 3) — P : (0, 0) — S : (1, 1) — D : (1, 1) — G : 1

The benefit of using a layer is to be able to perform similar operations at the same time. In other words, if we want to apply 4 different filters of the same size to an input channel, then we will have 4 output channels. These channels are the result of 4 different filters. So resulting from 4 distinct kernels .

In the previous section, we saw that the trainable parameters are what make up the convolution kernels. So the number of parameters increases linearly with the number of convolution kernels. Hence linearly with the number of desired output channels. Note also that the computing time also varies proportionally with the size of the input channel and proportionally with the number of kernels.

Conv2d: Finally Understand What Happens in the Forward Pass

Note that the curves in the Parameters graph are the same

The same principle applies to the number of input channels . Let’s consider the situation of an RGB encoded image. This image has 3 channels: red, blue and green. We can decide to extract information with filters of the same size on each of these 3 channels to obtain four new channels. The operation is thus 3 times the same, for 4 output channels.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (3, 7, 7) — Output Shape : (4, 5, 5) — K : (3, 3) — P : (0, 0) — S : (1, 1) — D : (1, 1) — G : 1

Each output channel is the sum of the filtered input channels.For 4 output channels and 3 input channels, each output channel is the sum of 3 filtered input channels. In other words, the convolution layer is composed of 4*3=12 convolution kernels.

As a reminder, the number of parameters and the computation time changes proportionally to the number of output channels. This is due to the fact that each output channel is linked to kernels distinct from the other channels. The same is true for the number of input channels . The calculation time and the number of parameters grows proportionally.

Conv2d: Finally Understand What Happens in the Forward Pass

Kernel size

So far, all examples have been given with 3 by 3 size kernels. In fact, the choice of its size is entirely up to you . It is possible to create a convolution layer with a core size 1*1 or 19*19.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (3, 7, 9) — Output Shape : (2, 3, 9) — K : (5, 2) — P : (0, 0) — S : (1, 1) — D : (1, 1) — G : 1

But it is also absolutely possible not to have square kernels. It is a possibility to decide to have kernels with different heights and widths . This is often the case in signal image analysis. If we know that we want to scan the image of a signal, of a sound, then we may want to prefer a 5*1 size kernel for example.

Finally, you will have noticed that all sizes are defined by an odd number. It is just as acceptable to define an even kernel size. In practice, this is rarely done. Usually, an odd size kernel is chosen because there is symmetry around a central pixel.

Conv2d: Finally Understand What Happens in the Forward Pass

Since all the (classical) trainable parameters of a convolution layer are in the kernels, the number of parameters grows linearly with the size of the kernels. The computation time also varies proportionally.

Strides

The kernels, by default, move from left to right, from bottom to top from pixel to pixel. But this movement can also be changed. Often used to down sample the output channel. For example with strides of (1, 3), the filter is shifted from 3 to 3 horizontally and from 1 to 1 vertically. This produces output channels down sampled by 3 horizontally.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (3, 9, 9) — Output Shape : (2, 7, 3) — K : (3, 3) — P : (0, 0) — S : (1, 3) — D : (1, 1) — G : 1

The strides have no impact on the number of parameters but the calculation time, logically, decreases linearly with the strides.

Conv2d: Finally Understand What Happens in the Forward Pass

Note that the curves in the Parameters graph are the same

Padding

The padding defines the number of pixels added to the sides of the input channels before their convolution filtering. Usually, the padding pixels are set to zero. The input channel is extended .

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape : (2, 7, 7) — Output Shape : (1, 7, 7) — K : (3, 3) — P : (1, 1) — S : (1, 1) — D : (1, 1) — G : 1

This is very useful when you want the size of the output channels to be equal to the size of the input channels. To make it simple, when the kernel is 3*3 then the output channel size decreases by one on each side. To overcome this problem we can use a padding of 1.

The padding, therefore, has no impact on the number of parameters, but generates an additional calculation time proportional to the size of the padding. But generally speaking, the padding is often small enough in comparison to the size of the input channel to consider there is no impact on the computation time.

Conv2d: Finally Understand What Happens in the Forward Pass

Note that the curves in the Parameters graph are the same

Dilation

The dilation is, in a way, the width of the core. By default equal to 1, it corresponds to the offset between each pixel of the kernel on the input channel during convolution .

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape: (2, 7, 7) — Output Shape : (1, 1, 5) — K : (3, 3) — P : (1, 1) — S : (1, 1) — D : (4, 2) — G : 1

I exaggerated a bit on my GIF, but if we take the example of a dilation of (4, 2) then the receptive field of the kernel on the input channel is widened by 4 * ( 3 -1)=8 vertically and 2 * (3–1)=4 horizontally (for a kernel of 3 by 3).

Just like padding, dilation has no impact on the number of parameters and very limited impact on the calculation time.

Conv2d: Finally Understand What Happens in the Forward Pass

Note that the curves in the Parameters graph are the same

Groups

Groups can be very useful in specific cases. For example, if we have several concatenated data sources. When it is not necessary to treat them dependent on each other. Input channels can be grouped and processed independently . Finally, the output channels are concatenated at the end.

If there are 2 input channels and 4 output channels with 2 groups. Then this is like dividing the input channels into two groups (so 1 input channel in each group) and making it go through a convolution layer with half as many output channels. The output channels are then concatenated.

Conv2d: Finally Understand What Happens in the Forward Pass

Input Shape : (2, 7, 7) — Output Shape : (4, 5, 5) — K : (3, 3) — P : (2, 2) — S : (2, 2) — D : (1, 1) — G : 2

It is important to note two things. Firstly, the number of groups must perfectly divide the number of input channels and the number of output channels ( common divisor ). Secondly, the kernels are shared with each group.

The number of parameters is therefore divided by the number of groups. Concerning the computation time with Pytorch, the algorithm is optimized for groups and therefore should reduce the computation time. However, it should also be taken into account that must sum up with the calculation time for group formation and concatenation of the output channels.

Conv2d: Finally Understand What Happens in the Forward Pass

Output Channel Size

With the knowledge of all the arguments, the size of the output channels can be calculated from the size of the input channels.

Conv2d: Finally Understand What Happens in the Forward Pass

Sources

Deep Learning Tutorial , Y. LeCun

Documentation torch.nn , Pytorch

Convolutional Neural Networks , cs231n

Convolutional Layers , Keras

All the images are home made

All computation time tests have been run with Pytorch, on my GPU (GeForce GTX 960M) and are available on this GitHub repository if you want to run them yourself or perform alternative tests.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

逆流而上

逆流而上

阿里巴巴集团成长集编委会 / 电子工业出版社 / 2017-11 / 59.00

本书是阿里巴巴集团荣耀背后的技术血泪史。全书通过分享业务运行过程中各个领域发生的典型“踩坑”案例,帮助大家快速提升自我及团队协作,学习到宝贵的处理经验及实践方案,为互联网生产系统的稳定共同努力。从基础架构、中间件、数据库、云计算、大数据等技术领域中不断积累经验,颠覆技术瓶颈,不断创新以适应不断增长的需求。 本书主要面向互联网技术从业人员和在校师生,使读者能够通过此书基本了解阿里在各技术领域的能力,......一起来看看 《逆流而上》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具