Translational Invariance Vs Translational Equivariance

栏目: IT技术 · 发布时间: 4年前

内容简介:Translational Invariance and Translational Equivariance are frequently confused as the same thing but are different properties of CNN. To understand the difference please read below.Convolutional Neural Networks have been the go-to architecture for Image a

Translational Invariance and Translational Equivariance are frequently confused as the same thing but are different properties of CNN. To understand the difference please read below.

Translational Invariance Vs Translational Equivariance

Convolutional Neural Networks have been the go-to architecture for Image and video-based tasks like classification, localization, segmentation, etc. They have shown super-human level performance in tasks that before were considered very difficult to achieve using basic image processing techniques. It has made classification tasks relatively easy to perform, without the need for feeding hand-curated features to the model as done before the revolution of CNN’s.

The Convolutional Neural Networks were inspired by the work of Nobel Prize-winning scientists Dr Hubel and Dr Wiesel, who demonstrated the working of visual cortex in the brain. They inserted micro-electrodes in the visual cortex of a partially anaesthetized cat so that she can’t move and moved a bright line across its retina. During this experiment, they noticed the following scenarios:

  1. The neurons fired when the line was in a particular place on the retina.
  2. The activity of the neurons changed depending on the orientation of the line.
  3. The Neurons sometimes fired only when the line was moving in a particular direction.

The classic experiment showed how the visual cortex processes information in a hierarchical way, extracting increasingly complex information. They showed that there is a topographical map in the visual cortex that represents the visual field, where nearby cells process information from nearby visual fields. This gave the concept of sparse interactions in CNN’s where the network focusses on local information rather than taking the complete global information. This property makes CNN’s provide state of the art performance in image-related tasks because in images nearby pixels are more strongly correlated than distant ones.

Moreover, their work determined that the neurons in the visual cortex are arranged in precise architecture. Cells with similar functions are organized into columns , tiny computational machines that relay information to a higher region of the brain, where a visual image is formed. This is similar to the way a CNN architecture is designed where lower layers extract edges and other common features and the higher layers extract more class-specific information. In all, their work revealed how visual cortical neurons encoded image features, the fundamental properties of objects that help us build our perception of the world around us.

Translational Invariance Vs Translational Equivariance

Figure 1 : Hubel and Weisel's Experiment visually explained.

Translational Invariance Vs Translational Equivariance

Figure 2:

Convolutional Neural Networks provide the three basic advantages over the traditional fully connected layers. Firstly, they have sparse connections instead of fully connected connections which lead to reduced parameters and make CNN’s efficient for processing high dimensional data. Secondly, weight sharing takes place where the same weights are shared across the entire image, causing reduced memory requirements as well as translational equivariance(will be explained in a moment). Thirdly, CNN’s use a very important concept of subsampling or pooling in which the most prominent pixels are propagated to the next layer dropping the rest. This provides a fixed size output matrix which is typically required for classification and invariance to translation, rotation .

TRANSLATIONAL EQUIVARIANCE:

Translational Equivariance or just equivariance is a very important property of the convolutional neural networks where the position of the object in the image should not be fixed in order for it to be detected by the CNN. This simply means that if the input changes, the output also changes. To be precise, a function f(x) is said to be equivariant to a function g if f(g(x)) = g(f(x)). If we have a function g which shifts each pixel of the image, one pixel to the right i.e I’(x,y) = I(x-1,y). If we apply the transformation g on the image and then apply convolution, the result will be the same as if we applied convolution to I’ and then applied translation g to the output. When processing images, this simply means that if we move the input 1 pixel to the right then it’s representations will also move 1 pixel to the right.

The property of translational equivariance is achieved in CNN’s by the concept of weight sharing. As the same weights are shared across the images, hence if an object occurs in any image it will be detected irrespective of its position in the image. This property is very useful for applications such as image classification, object detection, etc where there may be multiple occurrences of the object or the object might be in motion.

Convolutional Neural Networks are not naturally equivariant to some other transformations such as changes in the scale or rotation of the image. Other mechanisms are required to handle such transformations.

Translational Invariance Vs Translational Equivariance

Figure 3. Showing the Translational Equivariance property whereas the inputs are shifted towards the right, the representations are also shifted.

Translational Invariance Vs Translational Equivariance

Figure 4. Example of Translational Equivariance

TRANSLATIONAL INVARIANCE:

Translational Invariance is often confused with Translational Equivariance and many people, even the experts are confused between the two, unable to tell the difference.

Translational Invariance makes the CNN invariant to transformations such as rotations and scaling. Invariance to translation means that if we translate the inputs like rotating or scaling it, the CNN will still be able to detect the class to which the input belongs.

Translational Invariance is a result of the pooling operation. In a traditional CNN architecture, there are three stages. In the first stage, the layer performs convolution operation on the input to give linear activations. In the second stage, the resultant activations are passed through a non-linear activation function such as sigmoid, tanh or relu. In the third stage, we perform the pooling operation to modify the output further.

In pooling operation, we replace the output of the convnet at a certain location with a summary statistic of the nearby outputs such a maximum in case of MaxPooling. As we replace the output with the max in case of max-pooling, hence even if we change the input slightly, it won’t affect the values of most of the pooled outputs. Translational Invariance is a useful property where the exact location of the object is not required. For e.g if you are building a model to detect faces all you need to detect is whether eyes are present or not, it’s exact position is not necessary. While in segmentation tasks, the exact position is required.

The use of pooling can be viewed as adding a strong prior that the function the layer learns must be invariant to translation. When the prior is correct, it can greatly improve the statistical efficiency of the network.

Translational Invariance Vs Translational Equivariance

Figure 5 . In spite of various rotations, the input is still classified correctly due to the property of translational invariance.

The property of translational invariance and translational equivariance is utilized in a technique called data augmentation which comes handy when we have less training data or want to make the model train on a richer dataset. In data augmentation, we apply different transformations like rotation, flipping, zooming, translating etc to each batch of data sampled randomly from the training set and feed it to the model to make it more robust to transformations and increase performance.

Translational Invariance Vs Translational Equivariance

Figure 6: Data Augmentation Technique shown

CONCLUSION:

Convolutional Neural Networks are solving various challenges which before were considered unsolvable and, most of the time, beating human-level performances as were seen in the ImageNet challenge where Resnet performed better than a human. The concepts that make CNN’s so great are not complex but are very intuitive, logical and easy to understand.

I hope you liked the post and if you have any doubts, suggestions or requests please leave your comments below or get in touch with me on twitter or LinkedIn .

References:

  1. The Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville.
  2. Figure 2 for architecture of human visual system is taking from knowingneurons.com.
  3. Why equivariance is better than premature invariance Geoffrey Hinton(Figure 4).
  4. Figure 6 from itutorials.com.

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

解密硅谷

解密硅谷

[美]米歇尔 E. 梅西纳(Michelle E. Messina)、乔纳森 C. 贝尔(Jonathan C. Baer) / 李俊、李雪 / 机械工业出版社 / 2018-12 / 50.00

《解密硅谷》由身处硅谷最中心的连续创业者米歇尔·梅西纳和资深的投资人乔纳森·贝尔联合撰写,二人如庖丁解牛一般为读者深入剖析硅谷成功的原因:从硅谷的创新机制、创业生态、投资领域的潜规则、秘而不宣的价值观等角度,让阅读本书的人能够在最短的时间内,拥有像硅谷人一样的商业头脑,从而快速发现机遇,顺利地躲过创业的坑,熬过创业生死挑战中的劫数,带领初创公司顺利地活下去,并实现快速增长。 如果初创公司能够......一起来看看 《解密硅谷》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具