Visual Introduction to Self Supervised Learning

栏目: IT技术 · 发布时间: 4年前

内容简介：Yann Lecun, in hisCurious to know how self-supervised learning has been applied in the computer vision field, I read up on existing literature on self-supervised learning applied to computer vision through aThis post is my attempt to provide an intuitive v

Yann Lecun, in his talk , introduced the “cake analogy” to illustrate the importance of self-supervised learning. Though the analogy is debated( ref: Deep Learning for Robotics(Slide 96), Pieter Abbeel ), we have seen the impact of self-supervised learning in the Natural Language Processing field where recent developments (Word2Vec, Glove, ELMO , BERT ) have embraced self-supervision and achieved state of the art results.

“If intelligence is a cake, the bulk of the cake is self-supervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning (RL).”

Curious to know how self-supervised learning has been applied in the computer vision field, I read up on existing literature on self-supervised learning applied to computer vision through a recent survey paper by Jing et. al.

This post is my attempt to provide an intuitive visual summary of the patterns of problem formulation in self-supervised learning.

The Key Idea

To apply supervised learning, we need enough labeled data. To acquire that, human annotators manually label data(images/text) which is both a time consuming and expensive process. There are also fields such as the medical field where getting enough data is a challenge itself.

Visual Introduction to Self Supervised Learning

This is where self-supervised learning comes into play. It poses the following question to solve this:

Can we design the task in such a way that we can generate virtually unlimited labels from our existing images and use that to learn the representations?

Visual Introduction to Self Supervised Learning

We replace the human annotation block by creatively exploiting some property of data to set up a supervised task. For example, here instead of labeling images as cat/dog, we could instead rotate them by 0/90/180/270 degrees and train a model to predict rotation. We can generate virtually unlimited training data from millions of images we have freely available. Visual Introduction to Self Supervised Learning

Existing Creative Approaches

Below is a list of approaches various researchers have proposed to exploit image and video properties and learn representation in a self-supervised manner.

Learning from Images

1.Image Colorization

Formulation:

What if we prepared pairs of (grayscale, colorized) images by applying grayscale to millions of images we have freely available?

Visual Introduction to Self Supervised Learning

We could use an encoder-decoder architecture based on a fully convolutional neural network and compute the L2 loss between the predicted and actual color images.

Visual Introduction to Self Supervised Learning

To solve this task, the model has to learn about different objects present in the image and related parts so that it can paint those parts in the same color. Thus, representations learned are useful for downstream tasks. Visual Introduction to Self Supervised Learning

Papers:

Colorful Image Colorization | Real-Time User-Guided Image Colorization with Learned Deep Priors | Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification

2.Image Superresolution

Formulation:

What if we prepared training pairs of (small, upscaled) images by downsampling millions of images we have freely available?

Visual Introduction to Self Supervised Learning

GAN based models such as SRGAN are popular for this task. A generator takes a low-resolution image and outputs a high-resolution image using a fully convolutional network. The actual and generated images are compared using both mean-squared-error and content loss to imitate human-like quality comparison. A binary-classification discriminator takes an image and classifies whether it’s an actual high-resolution image(1) or a fake generated superresolution image(0). This interplay between the two models leads to generator learning to produce images with fine details. Visual Introduction to Self Supervised Learning

Both generator and discriminator learn semantic features that can be used for downstream tasks.

Papers:

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

3.Image Inpainting

Formulation:

What if we prepared training pairs of (corrupted, fixed) images by randomly removing part of images?

Visual Introduction to Self Supervised Learning

Similar to superresolution, we can leverage a GAN -based architecture where the Generator can learn to reconstruct the image while discriminator separates real and generated images. Visual Introduction to Self Supervised Learning

For downstream tasks, Pathak et al. have shown that semantic features learned by such a generator give 10.2% improvement over random initialization on the PASCAL VOC 2012 semantic segmentation challenge while giving <4% improvements over classification and object detection.

Papers:

Context encoders: Feature learning by inpainting

4.Image Jigsaw Puzzle

Formulation:

What if we prepared training pairs of (shuffled, ordered) puzzles by randomly shuffling patches of images?

Visual Introduction to Self Supervised Learning

Even with only 9 patches, there can be 362880 possible puzzles. To overcome this, only a subset of possible permutations is used such as 64 permutations with the highest hamming distance. Visual Introduction to Self Supervised Learning

Suppose we use a permutation that changes the image as shown below. Let’s use the permutation number 64 from our total available 64 permutations. Visual Introduction to Self Supervised Learning

Now, to recover back the original patches, Noroozi et al. proposed a neural network called context-free network ( CFN ) as shown below. Here, the individual patches are passed through the same siamese convolutional layers that have shared weights. Then, the features are combined in a fully-connected layer. In the output, the model has to predict which permutation was used from the 64 possible classes. If we know the permutation, we can solve the puzzle. Visual Introduction to Self Supervised Learning

To solve the Jigsaw puzzle, the model needs to learn to identify how parts are assembled in an object, relative positions of different parts of objects and shape of objects. Thus, the representations are useful for downstream tasks in classification and detection.

Papers:

Unsupervised learning of visual representations by solving jigsaw puzzles

5.Context Prediction

Formulation:

What if we prepared training pairs of (image-patch, neighbor) by randomly taking an image patch and one of its neighbors around it from large, unlabeled image collection?

Visual Introduction to Self Supervised Learning

To solve this pre-text task, Doersch et al. used an architecture similar to that of a jigsaw puzzle. We pass the patches through two siamese ConvNets to extract features, concatenate the features and do a classification over 8 classes denoting the 8 possible neighbor positions. Visual Introduction to Self Supervised Learning

Papers:

Unsupervised Visual Representation Learning by Context Prediction

6.Geometric Transformation Recognition

Formulation:

What if we prepared training pairs of (rotated-image, rotation-angle) by randomly rotating images by (0, 90, 180, 270) from large, unlabeled image collection?

Visual Introduction to Self Supervised Learning

To solve this pre-text task, Gidaris et al. propose an architecture where a rotated image is passed through a ConvNet and the network has to classify it into 4 classes(0/90/270/360 degrees). Visual Introduction to Self Supervised Learning

Though a very simple idea, the model has to understand location, types and pose of objects in an image to solve this task and as such, the representations learned are useful for downstream tasks.

Papers:

Unsupervised Representation Learning by Predicting Image Rotations

7.Image Clustering

Formulation:

What if we prepared training pairs of (image, cluster-number) by performing clustering on large, unlabeled image collection?

Visual Introduction to Self Supervised Learning

To solve this pre-text task, Caron et al. propose an architecture called deep clustering. Here, the images are first clustered and the clusters are used as classes. The task of the ConvNet is to predict the cluster label for an input image. Visual Introduction to Self Supervised Learning

Papers:

Deep clustering for unsupervised learning of visual features

8.Synthetic Imagery

Formulation:

What if we prepared training pairs of (image, properties) by generating synthetic images using game engines and adapting it to real images?

Visual Introduction to Self Supervised Learning

To solve this pre-text task, Ren et al. propose an architecture where weight-shared ConvNets are trained on both synthetic and real images and then a discriminator learns to classify whether ConvNet features fed to it is of a synthetic image or a real image. Due to adversarial nature, the shared representations between real and synthetic images get better. Visual Introduction to Self Supervised Learning

Learning from Videos

1.Frame Order Verification

Formulation:

What if we prepared training pairs of (video frames, correct/incorrect order) by shuffling frames from videos of objects in motion?

Visual Introduction to Self Supervised Learning

To solve this pre-text task, Misra et al. propose an architecture where video frames are passed through weight-shared ConvNets and the model has to figure out whether the frames are in the correct order or not. In doing so, the model learns not just spatial features but also takes into account temporal features. Visual Introduction to Self Supervised Learning

Papers:

Shuffle and Learn: Unsupervised Learning using Temporal Order Verification

References

Jing, et al. “ Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey. ”

以上所述就是小编给大家介绍的《Visual Introduction to Self Supervised Learning》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Visual Introduction to Self Supervised Learning

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

CSS实战手册（第2版）

[美] David Sawyer McFarland / 俞黎敏 / 电子工业出版社 / 2010-6 / 69.80元

本书从介绍最基本的CSS知识开始，到建立用于打印网页的CSS和改进你的CSS习惯的最佳实践。将关于CSS的选择器、继承、层叠、格式化、边距、填充、边框、图片、网站导航、表格、表单、浮动布局、定位网页上的元素，以及用于打印网页的CSS等技术通过逐步地讲解与教程串联了起来。每章内容从简单到复杂，一步一步地建立起一个完整的教程示例，并在每章都会详细讨论一些技巧、最佳实践和各浏览器之间一致性的兼容问题及如......一起来看看《CSS实战手册（第2版）》这本书的介绍吧!

码农工具

随机密码生成器

多种字符组合密码

Visual Introduction to Self Supervised Learning

The Key Idea

Existing Creative Approaches

Learning from Images

1.Image Colorization

2.Image Superresolution

3.Image Inpainting

4.Image Jigsaw Puzzle

5.Context Prediction

6.Geometric Transformation Recognition

7.Image Clustering

8.Synthetic Imagery

Learning from Videos

1.Frame Order Verification

References

CSS实战手册（第2版）

随机密码生成器

HTML 编码/解码