Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

栏目: IT技术 · 发布时间: 4年前

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

May 3 ·4min read

Hello everyone!

This article is going to be a short one and focuses on a less significant but highly overlooked concept in object detectors, especially in single shot detectors — Translation Invariance.

Let’s understand what translation invariance is and what makes an image classifier/object detector translation invariant.

*Note: This article assumes you have background knowledge of how single and two-stage detectors work :-)

Translation Invariance:

Translation in computer vision means displacement in space and Invariance means the property of being unchanged.

Therefore when we say an image classifier or an object detector is translation invariant, it means:

Image Classifier can predict a class accurately despite where the class(more specifically, pattern) is located along the image’s spatial dimensions. Similarly, a detector can detect an object irrespective of where it is present in the image.

Let us look at an example for each of the problem to make things clear.

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors — **Image Classification:** Untranslated and Translated versions of an image from MNIST dataset

In this article, we will be considering only Convolutional Neural Nets — be it a classifier or a detector and see whether they are translation invariant or not!

Translation Invariance in Convolutional Classifiers:

Are CNNs translation invariant? If so, what makes them invariant to translation?

Firstly, CNNs are not completely translation invariant but only to some extent. Next, it is ‘pooling’ that makes them translation invariant and not the convolution operation(applying filters).

The above statement is applicable only for classifiers and not for object detectors.

If we read Hinton’s paper on translation invariance in CNNs , he clearly states that the pooling layer was introduced to reduce computation complexity and that Translation Invariance was only a by-product of it.

One can make CNNs completely translation invariant by feeding the right kind of data — although this may not be 100% feasible.

Note: I won’t be addressing the question of how pooling makes CNNs translation invariant. You can check it out in the links below :-)

Geoffrey Hinton on what's wrong with CNNs

I am going to be posting some loose notes on different biologically-inspired machine learning lectures. In this note I…

moreisdifferent.com

http://cs231n.stanford.edu/reports/2016/pdfs/107_Report.pdf

Translation Invariance in Two-stage Detectors:

Two stage object detectors has the following components:

Region proposal stage
Classification stage

The first stage predicts locations of objects of interest(i.e region proposals) and the second stage classifies those region proposals.

We can see that the first stage predicts foreground object locations, which means the problem now is reduced to image classification — performed by the second stage . This reduction makes a two-stage detector translation invariant without introducing any explicit changes to the neural network architecture.

This decoupling of the object’s class prediction from the object’s bounding box prediction makes a two stage detector translation invariant !

Translation Invariance in Single stage detectors:

Now that we have looked into two stage detectors, we know that a single stage detector needs to couple box and class predictions. One way of doing that is to make dense predictions(anchors) on a feature map i.e at every grid cell or a group of cells on a feature map.

Read the following article where I explain in-depth about Anchors: Neural Networks Intuitions: 5. Anchors and Object Detection .

Since these dense predictions are made by convolving filters on feature maps, this enables the network to detect the same pattern when occurred in a different location on the feature map.

For example, let us consider a neural network trained to detect dogs present in an image. The filters in the final conv layer are responsible for recognizing these dog patterns.

We feed data to the network such that the dog always appear on the left side of the image and test it with an image where the dog appears on the right side.

One of the filters in the last layer learns the above dog pattern and since the same filter is being convolved throughout and a prediction is being made at every location in the feature map, it recognizes the same dog pattern in a different location!

Finally to answer the question of “Why filters make detectors translation invariant but not classifiers?”

Filters in Conv Nets learn local features in an image rather than taking in the global context. Since the problem of object detection is to detect local features(objects) from the image and not make predictions from the entire feature map(which is what happens in case of an image classifier), filters help in making them invariant to translation.

That’s all in this eighth instalment of my series. I hope you folks were able to get a good grasp of translation invariance in general and what makes the detector invariant to object translation in images. Please feel free to correct me if am wrong :-)

Cheers!

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Neural Networks Intuitions: 8. Translation Invariance in Object Detectors

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

深入剖析Tomcat

Budi Kurniawan、Paul Deck / 曹旭东 / 机械工业出版社华章公司 / 2011-12-31 / 59.00元

本书深入剖析Tomcat 4和Tomcat 5中的每个组件，并揭示其内部工作原理。通过学习本书，你将可以自行开发Tomcat组件，或者扩展已有的组件。 Tomcat是目前比较流行的Web服务器之一。作为一个开源和小型的轻量级应用服务器，Tomcat 易于使用，便于部署，但Tomcat本身是一个非常复杂的系统，包含了很多功能模块。这些功能模块构成了Tomcat的核心结构。本书从最基本的HTTP请求开......一起来看看《深入剖析Tomcat》这本书的介绍吧!

码农工具