A reading guide about Deep Learning with CNNs

栏目: IT技术 · 发布时间: 4年前

A reading guide about Deep Learning with CNNs

Part II: Image segmentation

Welcome back to Part II of this series. If you have missed the first part, have a look here: Part I: Image recognition and convolutional backbones .

In this part, you will find a guide through the literature about image segmentation with convolutional neural networks (CNNs) until 2019. It adds none scientific sources to this open access review paper to further increase an intuitive understanding of the evolution of CNNs.

Same as in part one, you can find the tables of the sources in this github repository:

Now, let’s dive into the next chapter of our adventure of deep learning with CNNs.

A rough overview about image segmentation with CNNs

During image segmentation, for each pixel a single class is predicted, like this:

A reading guide about Deep Learning with CNNs

Example for image segmentation. Modified according to: Hoeser and Kuenzer 2020 p. 8 [1]

When CNNs, which we discussed in Part I , became more popular, they were first used for so called patch based image segmentation. Therefore, a CNN moves over the input image in a moving window style and predicts the class of the center pixel of the patch (a little part of the whole the image) or the complete patch.

With the work of Long et al. 2014 [2], so called fully convolutional networks (FCNs) were introduced, and image segmentation with CNNs became much more sophisticated. Overall, the processing in FCNs looks like this: first features are extracted from the input image, by using a convolutional backbone ( the encoder ,see Part I). Thereby, the resolution is getting smaller, while feature depth is growing. The so extracted feature maps have a high semantic meaning but no precise localization. Since we need pixel-wise predictions for image segmentation, this feature maps are then upsampled back to input resolution (the decoder) . The difference to the input image is now, that each pixel holds a discrete class label and therefore the image is segmented in semantic meaningful classes.

Two major different concepts how the upsampling in the decoder can be done, do exist:

  • Naive decoder (this term was used e.g. in Chen et al. 2018 [3]): The upsampling is done by applying e.g. bilinear interpolation
  • Encoder-decoder: Upsampling is done by trainable deconvolution operations and/or by merging features from the encoder part with higher localization information during upsampling, see those examples:

A reading guide about Deep Learning with CNNs

Source: Hoeser and Kuenzer 2020 p. 17 [1]

In order to dive into image segmentation with deep learning, the sources in the table below are good starting points. Be aware of the fact, that next to CNNs there other deep learning model types which perform image segmentation, like generative adversarial networks (GANs) or long short term memory (LSTM) approaches, but this guide focuses on CNNs. Also, some times models of the R-CNN family are discussed from an image segmentation perspective. This guide will discuss them, when we reach object detection in the next part. So do not be confused, when you read about them somewhere else (like in review papers) and they are not mentioned here yet.

The evolution of FCNs for image segmentation

A reading guide about Deep Learning with CNNs

Performance of different FCN inspired architectures on the PASCAL-VOC 2012 benchmark dataset. * those models were tested on other datasets. Source: Hoeser and Kuenzer 2020 p. 17 [1]

The evolution of the DeepLab family is characteristic for the evolution of FCN inspired models for image segmentation. DeepLab variants can be found in both, naive-decoder and encoder-decoder models. Hence, the guide orientates on this family by first looking at naive-decoders and then turning towards encoder-decoder models.

Naive-decoder models

The most important insights of naive-decoder models are mainly the establishment of so called atrous convolutions and long range image context exploitation for prediction on pixel level. Atrous convolutions are a variant of normal convolutions, which allow an increasing receptive field without the loss of image resolution. The famous Atrous Spatial Pyramid Pooling module ( ASPP module ) in DeepLab-V2 [4] and later combines both: atrous convolutions and long range image context exploitation. When reading the following literature, focus on the developments of those features — Atrous convolutions, the ASPP module and long range image context exploitation/parsing.

Encoder-decoder models

The today most famous encoder-decoder is probably the U-Net [5]. A CNN which was developed for analyzing medical images. Its clear structure invited many researchers to experiment and adopt it and it is famous for its skip connections, which allow the sharing of features between encoder and decoder paths. Encoder-decoder models focus on enhancing the semantically rich feature maps during upsampling in the decoder with more locally precise feature maps from the encoder.

With the literature at hand, you will be able to reflect on modern image segmentation papers and implementations with CNNs. Let’s meet again in Part III, where we will discuss object detection.

References

[1] Hoeser, T; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends. Remote Sensing 2020, 12(10), 1667. DOI: 10.3390/rs12101667.

[2] Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 39, 640–651.

[3] Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision–ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C.; Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 833–851

[4] Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal.

Mach. Intell. 2016, 40, 834–848.

[5] Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015; Navab, N., Hornegger, J.,

Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

硅谷之火

硅谷之火

保罗·弗赖伯格、迈克尔·斯韦因 / 张华伟 编译 / 中国华侨出版社 / 2014-11-1 / CNY 39.80

《硅谷之火:人与计算机的未来》以生动的故事,介绍了计算机爱好者以怎样的创新精神和不懈的努力,将计算机技术的力量包装在一个小巧玲珑的机壳里,实现了个人拥有计算机的梦想。同时以独特的视角讲述了苹果、微软、太阳微系统、网景、莲花以及甲骨文等公司的创业者们在实现个人计算机梦想的过程中创业的艰辛、守业的艰难、失败的痛苦,在激烈竞争的环境中奋斗的精神以及在技术上不断前进的历程。一起来看看 《硅谷之火》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

在线进制转换器
在线进制转换器

各进制数互转换器

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器