Conv2d: Finally Understand What Happens in the Forward Pass

栏目: IT技术 · 发布时间: 4年前

内容简介：Deep Learning’sIn this post, I will try to list all these arguments. This post is for you if you want to see their impact on theAll GIFs are made with python. You will be able to test each of these arguments and

A visual and mathematical explanation of the 2D convolution layer and its arguments

Axel Thevenot

May 2 ·9min read

Introduction

Deep Learning’s libraries and platforms such as Tensorflow , Keras , Pytorch , Caffe or Theano help us in our daily lives so that every day new applications make us think “Wow!”. We all have our favorite framework, but what they all have in common is that they make things easy for us with functions that are easy to use that can be configured as needed. But we still need to understand what the arguments available are to take advantage of all the power these frameworks give us.

In this post, I will try to list all these arguments. This post is for you if you want to see their impact on the computation time , the number of trainable parameters and the size of the convolved output channels.

Conv2d: Finally Understand What Happens in the Forward Pass — Input Shape : (3, 7, 7) — Output Shape : (2, 3, 3) — K : (3, 3) — P : (1, 1) — S : (2, 2) — D : (2, 2) — G : 1

All GIFs are made with python. You will be able to test each of these arguments and visualize by yourself their impact with the scripts pushed on my Github (or to make your own GIFs).

The parts of this post will be divided according to the following arguments. These arguments can be found in the Pytorch documentation of the Conv2d module :

in_channels ( int ) — Number of channels in the input image
out_channels ( int ) — Number of channels produced by the convolution
kernel_size ( int or tuple ) — Size of the convolving kernel
stride ( int or tuple , optional ) — Stride of the convolution. Default: 1
padding ( int or tuple , optional ) — Zero-padding added to both sides of the input. Default: 0
dilation ( int or tuple , optional ) — Spacing between kernel elements. Default: 1
groups ( int , optional ) — Number of blocked connections from input channels to output channels. Default: 1
bias ( bool , optional ) — If True , adds a learnable bias to the output. Default: True

Finally, we will have all the keys to calculate the size of the output channels according to the arguments and the size of the input channels.

What is a Kernel?

Let me introduce what a kernel is (or convolution matrix ). A kernel describes a filter that we are going to pass over an input image. To make it simple, the kernel will move over the whole image, from left to right, from top to bottom by applying a convolution product . The output of this operation is called a filtered image .

Convolution product

To take a very basic example, let’s imagine a 3 by 3 convolution kernel filtering a 9 by 9 image. Then this kernel moves all over the image to capture in the image all squares of the same size (3 by 3). The convolution product is an element-wise (or point-wise) multiplication. The sum of this result is the resulting pixel on the output (or filtered) image.

If you are not already familiar with filters and convolution matrices, then I strongly advise you to take a little more time to understand the convolution kernels. They are the core of the 2D convolution layer .

Trainable Parameters and Bias

The trainable parameters , which are also simply called “parameters”, are all the parameters that will be updated when the network is trained. In a Conv2d, the trainable elements are the values that compose the kernels . So for our 3 by 3 convolution kernel, we have 3*3=9 trainable parameters.

To be more complete. We can include bias or not. The role of bias is to be added to the sum of the convolution product. This bias is also a trainable parameter which makes the number of trainable parameters for our 3 by 3 kernel rise to 10.

Number of Input and Output Channels

The benefit of using a layer is to be able to perform similar operations at the same time. In other words, if we want to apply 4 different filters of the same size to an input channel, then we will have 4 output channels. These channels are the result of 4 different filters. So resulting from 4 distinct kernels .

In the previous section, we saw that the trainable parameters are what make up the convolution kernels. So the number of parameters increases linearly with the number of convolution kernels. Hence linearly with the number of desired output channels. Note also that the computing time also varies proportionally with the size of the input channel and proportionally with the number of kernels.

The same principle applies to the number of input channels . Let’s consider the situation of an RGB encoded image. This image has 3 channels: red, blue and green. We can decide to extract information with filters of the same size on each of these 3 channels to obtain four new channels. The operation is thus 3 times the same, for 4 output channels.

Each output channel is the sum of the filtered input channels.For 4 output channels and 3 input channels, each output channel is the sum of 3 filtered input channels. In other words, the convolution layer is composed of 4*3=12 convolution kernels.

As a reminder, the number of parameters and the computation time changes proportionally to the number of output channels. This is due to the fact that each output channel is linked to kernels distinct from the other channels. The same is true for the number of input channels . The calculation time and the number of parameters grows proportionally.

Kernel size

So far, all examples have been given with 3 by 3 size kernels. In fact, the choice of its size is entirely up to you . It is possible to create a convolution layer with a core size 1*1 or 19*19.

But it is also absolutely possible not to have square kernels. It is a possibility to decide to have kernels with different heights and widths . This is often the case in signal image analysis. If we know that we want to scan the image of a signal, of a sound, then we may want to prefer a 5*1 size kernel for example.

Finally, you will have noticed that all sizes are defined by an odd number. It is just as acceptable to define an even kernel size. In practice, this is rarely done. Usually, an odd size kernel is chosen because there is symmetry around a central pixel.

Since all the (classical) trainable parameters of a convolution layer are in the kernels, the number of parameters grows linearly with the size of the kernels. The computation time also varies proportionally.

Strides

The kernels, by default, move from left to right, from bottom to top from pixel to pixel. But this movement can also be changed. Often used to down sample the output channel. For example with strides of (1, 3), the filter is shifted from 3 to 3 horizontally and from 1 to 1 vertically. This produces output channels down sampled by 3 horizontally.

The strides have no impact on the number of parameters but the calculation time, logically, decreases linearly with the strides.

Padding

The padding defines the number of pixels added to the sides of the input channels before their convolution filtering. Usually, the padding pixels are set to zero. The input channel is extended .

This is very useful when you want the size of the output channels to be equal to the size of the input channels. To make it simple, when the kernel is 3*3 then the output channel size decreases by one on each side. To overcome this problem we can use a padding of 1.

The padding, therefore, has no impact on the number of parameters, but generates an additional calculation time proportional to the size of the padding. But generally speaking, the padding is often small enough in comparison to the size of the input channel to consider there is no impact on the computation time.

Dilation

The dilation is, in a way, the width of the core. By default equal to 1, it corresponds to the offset between each pixel of the kernel on the input channel during convolution .

I exaggerated a bit on my GIF, but if we take the example of a dilation of (4, 2) then the receptive field of the kernel on the input channel is widened by 4 * ( 3 -1)=8 vertically and 2 * (3–1)=4 horizontally (for a kernel of 3 by 3).

Just like padding, dilation has no impact on the number of parameters and very limited impact on the calculation time.

Groups

Groups can be very useful in specific cases. For example, if we have several concatenated data sources. When it is not necessary to treat them dependent on each other. Input channels can be grouped and processed independently . Finally, the output channels are concatenated at the end.

If there are 2 input channels and 4 output channels with 2 groups. Then this is like dividing the input channels into two groups (so 1 input channel in each group) and making it go through a convolution layer with half as many output channels. The output channels are then concatenated.

It is important to note two things. Firstly, the number of groups must perfectly divide the number of input channels and the number of output channels ( common divisor ). Secondly, the kernels are shared with each group.

The number of parameters is therefore divided by the number of groups. Concerning the computation time with Pytorch, the algorithm is optimized for groups and therefore should reduce the computation time. However, it should also be taken into account that must sum up with the calculation time for group formation and concatenation of the output channels.

Output Channel Size

With the knowledge of all the arguments, the size of the output channels can be calculated from the size of the input channels.

Sources

Deep Learning Tutorial , Y. LeCun

Documentation torch.nn , Pytorch

Convolutional Neural Networks , cs231n

Convolutional Layers , Keras

All the images are home made

All computation time tests have been run with Pytorch, on my GPU (GeForce GTX 960M) and are available on this GitHub repository if you want to run them yourself or perform alternative tests.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Conv2d: Finally Understand What Happens in the Forward Pass

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

TCP/IP详解卷3：TCP事务协议、HTTP、NNTP和UNIX域协议

胡谷雨、吴礼发、W.Richard Stevens / 胡谷雨 / 机械工业出版社 / 2000-9 / 35.00元

《CP.IP详解(卷3):CP事务协议.HP.P和UIX域协议》是“TCP/IP详解系列”的延续。主要内容包括：TCP事务协议，即T/TCP，这是对TCP的扩展，使客户-服务器事务更快、更高效和更可靠；TCP/IP应用，主要是HTTP和NNTP；UNIX域协议，这些协议提供了进程之间通信的一种手段。当客户与服务器进程在同一台主机上时，UNIX域协议通常要比TCP/IP快一倍。《CP.IP详解(卷3......一起来看看《TCP/IP详解卷3：TCP事务协议、HTTP、NNTP和UNIX域协议》这本书的介绍吧!

码农工具

Conv2d: Finally Understand What Happens in the Forward Pass

A visual and mathematical explanation of the 2D convolution layer and its arguments

Introduction

What is a Kernel?

Trainable Parameters and Bias

Number of Input and Output Channels

Kernel size

Strides

Padding

Dilation

Groups

Output Channel Size

Sources

TCP/IP详解卷3：TCP事务协议、HTTP、NNTP和UNIX域协议

RGB转16进制工具

HTML 编码/解码

HEX CMYK 转换工具

Conv2d: Finally Understand What Happens in the Forward Pass

A visual and mathematical explanation of the 2D convolution layer and its arguments

Introduction

What is a Kernel?

Trainable Parameters and Bias

Number of Input and Output Channels

Kernel size

Strides

Padding

Dilation

Groups

Output Channel Size

Sources

TCP/IP详解 卷3：TCP事务协议、HTTP、NNTP和UNIX域协议

RGB转16进制工具

HTML 编码/解码

HEX CMYK 转换工具

TCP/IP详解卷3：TCP事务协议、HTTP、NNTP和UNIX域协议