Translational Invariance Vs Translational Equivariance

栏目: IT技术 · 发布时间: 4年前

内容简介:Translational Invariance and Translational Equivariance are frequently confused as the same thing but are different properties of CNN. To understand the difference please read below.Convolutional Neural Networks have been the go-to architecture for Image a

Translational Invariance and Translational Equivariance are frequently confused as the same thing but are different properties of CNN. To understand the difference please read below.

Translational Invariance Vs Translational Equivariance

Convolutional Neural Networks have been the go-to architecture for Image and video-based tasks like classification, localization, segmentation, etc. They have shown super-human level performance in tasks that before were considered very difficult to achieve using basic image processing techniques. It has made classification tasks relatively easy to perform, without the need for feeding hand-curated features to the model as done before the revolution of CNN’s.

The Convolutional Neural Networks were inspired by the work of Nobel Prize-winning scientists Dr Hubel and Dr Wiesel, who demonstrated the working of visual cortex in the brain. They inserted micro-electrodes in the visual cortex of a partially anaesthetized cat so that she can’t move and moved a bright line across its retina. During this experiment, they noticed the following scenarios:

  1. The neurons fired when the line was in a particular place on the retina.
  2. The activity of the neurons changed depending on the orientation of the line.
  3. The Neurons sometimes fired only when the line was moving in a particular direction.

The classic experiment showed how the visual cortex processes information in a hierarchical way, extracting increasingly complex information. They showed that there is a topographical map in the visual cortex that represents the visual field, where nearby cells process information from nearby visual fields. This gave the concept of sparse interactions in CNN’s where the network focusses on local information rather than taking the complete global information. This property makes CNN’s provide state of the art performance in image-related tasks because in images nearby pixels are more strongly correlated than distant ones.

Moreover, their work determined that the neurons in the visual cortex are arranged in precise architecture. Cells with similar functions are organized into columns , tiny computational machines that relay information to a higher region of the brain, where a visual image is formed. This is similar to the way a CNN architecture is designed where lower layers extract edges and other common features and the higher layers extract more class-specific information. In all, their work revealed how visual cortical neurons encoded image features, the fundamental properties of objects that help us build our perception of the world around us.

Translational Invariance Vs Translational Equivariance

Figure 1 : Hubel and Weisel's Experiment visually explained.

Translational Invariance Vs Translational Equivariance

Figure 2:

Convolutional Neural Networks provide the three basic advantages over the traditional fully connected layers. Firstly, they have sparse connections instead of fully connected connections which lead to reduced parameters and make CNN’s efficient for processing high dimensional data. Secondly, weight sharing takes place where the same weights are shared across the entire image, causing reduced memory requirements as well as translational equivariance(will be explained in a moment). Thirdly, CNN’s use a very important concept of subsampling or pooling in which the most prominent pixels are propagated to the next layer dropping the rest. This provides a fixed size output matrix which is typically required for classification and invariance to translation, rotation .

TRANSLATIONAL EQUIVARIANCE:

Translational Equivariance or just equivariance is a very important property of the convolutional neural networks where the position of the object in the image should not be fixed in order for it to be detected by the CNN. This simply means that if the input changes, the output also changes. To be precise, a function f(x) is said to be equivariant to a function g if f(g(x)) = g(f(x)). If we have a function g which shifts each pixel of the image, one pixel to the right i.e I’(x,y) = I(x-1,y). If we apply the transformation g on the image and then apply convolution, the result will be the same as if we applied convolution to I’ and then applied translation g to the output. When processing images, this simply means that if we move the input 1 pixel to the right then it’s representations will also move 1 pixel to the right.

The property of translational equivariance is achieved in CNN’s by the concept of weight sharing. As the same weights are shared across the images, hence if an object occurs in any image it will be detected irrespective of its position in the image. This property is very useful for applications such as image classification, object detection, etc where there may be multiple occurrences of the object or the object might be in motion.

Convolutional Neural Networks are not naturally equivariant to some other transformations such as changes in the scale or rotation of the image. Other mechanisms are required to handle such transformations.

Translational Invariance Vs Translational Equivariance

Figure 3. Showing the Translational Equivariance property whereas the inputs are shifted towards the right, the representations are also shifted.

Translational Invariance Vs Translational Equivariance

Figure 4. Example of Translational Equivariance

TRANSLATIONAL INVARIANCE:

Translational Invariance is often confused with Translational Equivariance and many people, even the experts are confused between the two, unable to tell the difference.

Translational Invariance makes the CNN invariant to transformations such as rotations and scaling. Invariance to translation means that if we translate the inputs like rotating or scaling it, the CNN will still be able to detect the class to which the input belongs.

Translational Invariance is a result of the pooling operation. In a traditional CNN architecture, there are three stages. In the first stage, the layer performs convolution operation on the input to give linear activations. In the second stage, the resultant activations are passed through a non-linear activation function such as sigmoid, tanh or relu. In the third stage, we perform the pooling operation to modify the output further.

In pooling operation, we replace the output of the convnet at a certain location with a summary statistic of the nearby outputs such a maximum in case of MaxPooling. As we replace the output with the max in case of max-pooling, hence even if we change the input slightly, it won’t affect the values of most of the pooled outputs. Translational Invariance is a useful property where the exact location of the object is not required. For e.g if you are building a model to detect faces all you need to detect is whether eyes are present or not, it’s exact position is not necessary. While in segmentation tasks, the exact position is required.

The use of pooling can be viewed as adding a strong prior that the function the layer learns must be invariant to translation. When the prior is correct, it can greatly improve the statistical efficiency of the network.

Translational Invariance Vs Translational Equivariance

Figure 5 . In spite of various rotations, the input is still classified correctly due to the property of translational invariance.

The property of translational invariance and translational equivariance is utilized in a technique called data augmentation which comes handy when we have less training data or want to make the model train on a richer dataset. In data augmentation, we apply different transformations like rotation, flipping, zooming, translating etc to each batch of data sampled randomly from the training set and feed it to the model to make it more robust to transformations and increase performance.

Translational Invariance Vs Translational Equivariance

Figure 6: Data Augmentation Technique shown

CONCLUSION:

Convolutional Neural Networks are solving various challenges which before were considered unsolvable and, most of the time, beating human-level performances as were seen in the ImageNet challenge where Resnet performed better than a human. The concepts that make CNN’s so great are not complex but are very intuitive, logical and easy to understand.

I hope you liked the post and if you have any doubts, suggestions or requests please leave your comments below or get in touch with me on twitter or LinkedIn .

References:

  1. The Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville.
  2. Figure 2 for architecture of human visual system is taking from knowingneurons.com.
  3. Why equivariance is better than premature invariance Geoffrey Hinton(Figure 4).
  4. Figure 6 from itutorials.com.

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

深入浅出Rust

深入浅出Rust

范长春 / 机械工业出版社 / 2018-8-21 / 89.00元

本书详细描述了Rust语言的基本语法,穿插讲解一部分高级使用技巧,并以更容易理解的方式解释其背后的设计思想。全书总共分五个部分。 第一部分介绍Rust基本语法,因为对任何程序设计语言来说,语法都是基础,学习这部分是理解其他部分的前提。 第二部分介绍属于Rust独一无二的内存管理方式。它设计了一组全新的机制,既保证了安全性,又保持了强大的内存布局控制力,而且没有额外性能损失。这部分是本书......一起来看看 《深入浅出Rust》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

SHA 加密
SHA 加密

SHA 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换