Translational Invariance Vs Translational Equivariance

栏目: IT技术 · 发布时间: 5年前

内容简介：Translational Invariance and Translational Equivariance are frequently confused as the same thing but are different properties of CNN. To understand the difference please read below.Convolutional Neural Networks have been the go-to architecture for Image a

Translational Invariance and Translational Equivariance are frequently confused as the same thing but are different properties of CNN. To understand the difference please read below.

Divyanshu Mishra

Jan 17 ·6min read

Translational Invariance Vs Translational Equivariance

Convolutional Neural Networks have been the go-to architecture for Image and video-based tasks like classification, localization, segmentation, etc. They have shown super-human level performance in tasks that before were considered very difficult to achieve using basic image processing techniques. It has made classification tasks relatively easy to perform, without the need for feeding hand-curated features to the model as done before the revolution of CNN’s.

The Convolutional Neural Networks were inspired by the work of Nobel Prize-winning scientists Dr Hubel and Dr Wiesel, who demonstrated the working of visual cortex in the brain. They inserted micro-electrodes in the visual cortex of a partially anaesthetized cat so that she can’t move and moved a bright line across its retina. During this experiment, they noticed the following scenarios:

The neurons fired when the line was in a particular place on the retina.
The activity of the neurons changed depending on the orientation of the line.
The Neurons sometimes fired only when the line was moving in a particular direction.

The classic experiment showed how the visual cortex processes information in a hierarchical way, extracting increasingly complex information. They showed that there is a topographical map in the visual cortex that represents the visual field, where nearby cells process information from nearby visual fields. This gave the concept of sparse interactions in CNN’s where the network focusses on local information rather than taking the complete global information. This property makes CNN’s provide state of the art performance in image-related tasks because in images nearby pixels are more strongly correlated than distant ones.

Moreover, their work determined that the neurons in the visual cortex are arranged in precise architecture. Cells with similar functions are organized into columns , tiny computational machines that relay information to a higher region of the brain, where a visual image is formed. This is similar to the way a CNN architecture is designed where lower layers extract edges and other common features and the higher layers extract more class-specific information. In all, their work revealed how visual cortical neurons encoded image features, the fundamental properties of objects that help us build our perception of the world around us.

Convolutional Neural Networks provide the three basic advantages over the traditional fully connected layers. Firstly, they have sparse connections instead of fully connected connections which lead to reduced parameters and make CNN’s efficient for processing high dimensional data. Secondly, weight sharing takes place where the same weights are shared across the entire image, causing reduced memory requirements as well as translational equivariance(will be explained in a moment). Thirdly, CNN’s use a very important concept of subsampling or pooling in which the most prominent pixels are propagated to the next layer dropping the rest. This provides a fixed size output matrix which is typically required for classification and invariance to translation, rotation .

TRANSLATIONAL EQUIVARIANCE:

Translational Equivariance or just equivariance is a very important property of the convolutional neural networks where the position of the object in the image should not be fixed in order for it to be detected by the CNN. This simply means that if the input changes, the output also changes. To be precise, a function f(x) is said to be equivariant to a function g if f(g(x)) = g(f(x)). If we have a function g which shifts each pixel of the image, one pixel to the right i.e I’(x,y) = I(x-1,y). If we apply the transformation g on the image and then apply convolution, the result will be the same as if we applied convolution to I’ and then applied translation g to the output. When processing images, this simply means that if we move the input 1 pixel to the right then it’s representations will also move 1 pixel to the right.

The property of translational equivariance is achieved in CNN’s by the concept of weight sharing. As the same weights are shared across the images, hence if an object occurs in any image it will be detected irrespective of its position in the image. This property is very useful for applications such as image classification, object detection, etc where there may be multiple occurrences of the object or the object might be in motion.

Convolutional Neural Networks are not naturally equivariant to some other transformations such as changes in the scale or rotation of the image. Other mechanisms are required to handle such transformations.

TRANSLATIONAL INVARIANCE:

Translational Invariance is often confused with Translational Equivariance and many people, even the experts are confused between the two, unable to tell the difference.

Translational Invariance makes the CNN invariant to transformations such as rotations and scaling. Invariance to translation means that if we translate the inputs like rotating or scaling it, the CNN will still be able to detect the class to which the input belongs.

Translational Invariance is a result of the pooling operation. In a traditional CNN architecture, there are three stages. In the first stage, the layer performs convolution operation on the input to give linear activations. In the second stage, the resultant activations are passed through a non-linear activation function such as sigmoid, tanh or relu. In the third stage, we perform the pooling operation to modify the output further.

In pooling operation, we replace the output of the convnet at a certain location with a summary statistic of the nearby outputs such a maximum in case of MaxPooling. As we replace the output with the max in case of max-pooling, hence even if we change the input slightly, it won’t affect the values of most of the pooled outputs. Translational Invariance is a useful property where the exact location of the object is not required. For e.g if you are building a model to detect faces all you need to detect is whether eyes are present or not, it’s exact position is not necessary. While in segmentation tasks, the exact position is required.

The use of pooling can be viewed as adding a strong prior that the function the layer learns must be invariant to translation. When the prior is correct, it can greatly improve the statistical efficiency of the network.

The property of translational invariance and translational equivariance is utilized in a technique called data augmentation which comes handy when we have less training data or want to make the model train on a richer dataset. In data augmentation, we apply different transformations like rotation, flipping, zooming, translating etc to each batch of data sampled randomly from the training set and feed it to the model to make it more robust to transformations and increase performance.

CONCLUSION:

Convolutional Neural Networks are solving various challenges which before were considered unsolvable and, most of the time, beating human-level performances as were seen in the ImageNet challenge where Resnet performed better than a human. The concepts that make CNN’s so great are not complex but are very intuitive, logical and easy to understand.

I hope you liked the post and if you have any doubts, suggestions or requests please leave your comments below or get in touch with me on twitter or LinkedIn .

References:

The Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville.
Figure 2 for architecture of human visual system is taking from knowingneurons.com.
Why equivariance is better than premature invariance Geoffrey Hinton(Figure 4).
Figure 6 from itutorials.com.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Translational Invariance Vs Translational Equivariance

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

算法技术手册

[美]海涅曼 (Heineman.G.T.)、[美]波利切 (Pollice.G.)、[美]塞克欧 (Selkow.S.) / 东南大学出版社 / 2009-4 / 58.00元

创造稳定的软件需要有效的算法，但是程序设计者们很少能在问题出现之前就想到。《算法技术手册(影印版)》描述了现有的可以解决多种问题的算法，并且能够帮助你根据需求选择并实现正确的算法——只需要一定的数学知识即可理解并分析算法执行。相对于理论来说，本书更注重实际运用，书中提供了多种程序语言中可用的有效代码解决方案，可轻而易举地适合一个特定的项目。有了这本书，你可以：解决特定编码问题或改进现有解决......一起来看看《算法技术手册》这本书的介绍吧!

码农工具