Convolutional Neural Networks

栏目: IT技术 · 发布时间: 4年前

内容简介：The goal of this article is to explore the following concepts:As you can findhere, a neural network is a universal function approximator. This means that in essence, neural networks solve problems by trying to find the best possible approximation to a func

Introduction & Convolutions

Victor Roman

Feb 27 ·10min read

The goal of this article is to explore the following concepts:

Introduction to Convolutional Neural Networks. Use cases and examples.
Convolutions. Examples in Python
CNNs.
Locally Connected layers

Introduction to Convolutional Neural Networks

As you can findhere, a neural network is a universal function approximator. This means that in essence, neural networks solve problems by trying to find the best possible approximation to a function that allows us to solve our problem.

To do this we have a series of parameters (the weights and the bias) that we are updating using the backpropagation algorithm, which is based on the descent gradient.

Thanks to our labels, we can calculate the error in each iteration and modify the weights to reduce it progressively.

And what’s a convolutional neural network? Or more importantly, what problems does it solve?

In short, convolutional neural networks can solve all the problems that can be expressed in image form.

For example, just take into account when you are trying to label someone in your Facebook pictures. Have you noticed that it suggests the person’s profile? That’s a convnet!

Or perhaps you have heard of autonomous cars, which can “read” traffic signs, recognize other cars and even detect if a person is crossing the street. That functionalities are based in convnets too!

CNNs are the state of art for solving medical imaging problems.And these are just a few examples but there are many more.

The reason why they have become so popular in the past recent years is because they can find the right features, on their own, to later classify images correctly. And they do it in a very efficient way.

But what exactly is a CNN?

CNN is a neural network in which new types of layers are introduced, the most important of which is convolutional.

And what is convolution?

Convolution

Strictly speaking, convolution is mainly used in signal processing and is a mathematical operation that allows two signals to be combined.

In digital signal processing, convolution is used to know what will happen to a signal after “passing” through a certain device.

For example, to know how our voice changes after passing through the microphone of our mobile phone, we could calculate the convolution of our voice with the response to the microphone impulse.

Convolutional neural networks have become famous for their ability to detect patterns that they then classify. Those pattern detectors are convolutions.

Let’s see how a computer understands an image:

As you can see, a color image is represented as a 3-dimensional matrix: Width x Height x Channels.

There are several ways to represent the images, but the most common is using the RGB color space. This means that a computer eventually sees 3 matrices of Weight x Height, where the first one tells you the amount of red the image has, the second one tells you the amount of green, and the third one tells you the amount of blue.

If the image were in grayscale, the computer would see it as a single two-dimensional Weight x Height matrix.

Finally, the values that the elements of the matrix can take depend on the type of the variable used. The most common ones are:

If we use 8-bit integers: they can go from 0 to 255
If we use floats: 0 to 1

Knowing that the image is a matrix, what the convolution does is to define a filter or kernel through which it will multiply the image matrix. If we take a look at the next image:

You define a kernel, 3x3 pixels, and multiply it to the input_image. What happens? That the kernel is much smaller than the image, so to multiply the whole image, first we place the kernel on the first 3x3 pixels, then we move it one to the right, then another, then another… and we calculate the sum of the multiplication of each element of the kernel by each corresponding pixel of the image. The result of this operation is stored in the output image, as you can see.

Here you can see it more clearly:

Examples in Python

Let’s see some examples to see what happens when we do these multiplications and additions and how they help to detect patterns and make predictions.

import numpy as np
from scipy import signal
from scipy import misc
ascent = misc.ascent()
kernel = np.array([[-1, 0, +1],
                   [-1, 0, +1],
                   [-1, 0, +1]])
grad = signal.convolve2d(ascent, kernel, boundary='symm', mode='same')import matplotlib.pyplot as plt# function to show two pictures together
def plot_two(img_orig, img_conv):
  fig, (ax_orig, ax_mag) = plt.subplots(1, 2, figsize=(20, 50))
  ax_orig.imshow(img_orig, cmap='gray')
  ax_orig.set_title('Original')
  ax_orig.set_axis_off()
  ax_mag.imshow((img_conv), cmap='gray')
  ax_mag.set_title('Gradient')
  ax_mag.set_axis_off()plot_two(ascent, grad)

This a vertical line detector. Let’s define and use a horizontal line detector.

kernel = np.array([[-1, -1, -1],
                   [ 0,  0,  0],
                   [+1, +1, +1]])
grad_v = signal.convolve2d(ascent, kernel, boundary='symm', mode='same')
plot_two(ascent, grad_v)

Let’s see some of the most used kernels in Traditional convolution

Let’s print first an unmodified picture:

# load and show the original picture
url_img = 'https://upload.wikimedia.org/wikipedia/commons/5/50/Vd-Orig.png'
from urllib.request import urlopen 
from io import BytesIO
from PIL import Image
file = BytesIO(urlopen(url_img).read()) 
img = np.asarray(Image.open(file), dtype='uint8')
plt.imshow(img)
plt.axis('off')

def convolve3d(img, kernel):
  img_out = np.zeros(img.shape)
  for i in range(img.shape[-1]):
     img_out[:,:,i] = signal.convolve2d(img[:,:,i], kernel, boundary='symm', mode='same')
  return img_out.astype('uint8')

Identity Kernel

# Let's try with the Identity Kernel
kernel = [[0, 0, 0],
         [0, 1, 0],
         [0, 0, 0]]
img_ki = convolve3d(img, kernel)
plot_two(img, img_ki)

Other kernels that are most used (and their outputs) are:

All of this is very informative… But how is convolution able to detect patterns?

Pattern Detection Example

Let’s say that we have this filter:

And the following image

What happens if the filter lays on the rat’s back?

The result will be:

30·0 + 30·50 + 30·20 + 30·50 + 30·50 + 30·50=6600

Which is a very high number and indicates that we have found out a curve.

What happens if the filter lays on the rat’s head?

The result will be:

30·0 + 30·0 + 30·0 + 30·0 + 30·0 + 30·0=0

CNNs

Now that we have introduced the concept of convolution, let’s study what are the convolutional neural networks and how they work.

In these images, we can see the typical architecture of a convolutional neural network. This is nothing more than a W x H x 3 matrices (because it is RGB). Then the “convolutional blocks” start.

These blocks are usually composed of:

Convolutional layers
Pooling layers, which decimate the content of the convolutional layer output

Before we already know how convolution works: We define a kernel or filter that serves to highlight certain structures in the image.

But how do I define a filter that allows me to find out that the input image has a black cat in it?

That’s CNNs magic! We don’t have to define any filter, the network learns them automatically thanks to backpropagation!

Our CNNs has two stages: feature extractor and classifier.

The feature extraction stage goes from the general patterns, or structures, to the specifics:

The first convolutional layers detect lines in different orientations
The next ones detect shapes and colors
The next ones more complex patterns

So in the end, what we have is a network that learns on its own, so we don’t have to worry about which characteristics we choose to classify since it chooses them on its own.

And how is it learning? The same way as a traditional neural network.

The second stage, the classifier, is made up of dense layers, which are the layers used in traditional neural networks.

So finally a CNN could be understood as a set of convolutional stages coupled to a traditional neural network, which is the one that classifies the patterns extracted by the convolutions and returns some probabilities for each class.

Types of layers in a CNN

Convolutional

These layers are in charge of applying the convolution to our input images to find the patterns that will later allow us to classify it:

The number of filters/kernels to apply to the image: the number of matrices through which the input images will be convoluted
The size of these filters: 99% of the time they are square, 3x3, 5x5, etc.

Here you can see the general scheme, in which you can see how a given input image is convoluted by each filter, and the output is 2D activation maps. This means that if the input image is RGB, it will have 3 channels. Therefore, we will convolute each filter for each channel, and then we will add the results, to reduce from 3 channels to only 1:

As the input has 3 channels, R, G and B, this means that our input image is defined as 3 two-dimensional arrays, one for each channel.

So what the convolution layer does is apply the convolution separately to each channel, get the result of each channel, and then add them up to get a single 2D matrix that is called an activation map.

In this link you can see it more in detail:

http://cs231n.github.io/assets/conv-demo/index.html

Besides the number of filters and the size, convolutional layers have another important parameter that we should take into account: the stride.

This would be an example of a 1 unit Stride:

And this would be an example of a 2 unit Stride convolution:

You can tell that the difference is the length of the step that the kernel takes in each iteration.

Receptive Field

In the case of convolutional layers, the output neurons have been connected to only one local region of the input image.

It can be understood as “what the network see”. With the dense layers the opposite happens, all the neurons have been connected to all the previous elements. However, the neurons still function the same, the only thing is that at the entrance they “see” the whole image, instead of a region of it.

As you can find in thisgreat article:

The receptive field determines what area of the original input to the entire network the output gets to see.

Pooling

Pooling layers are used to reduce the size of our activation maps, otherwise, it would not be possible to run them on many GPUs. The two most common types of pooling are:

max pooling: calculates the maximum of the elements
average pooling: calculates the average of the elements

It must be taken into account that this is done for each activation map of our volume, that is, the depth dimension does not intervene at all in the calculations.

Let’s see an example of maxpooling with different strides:

Locally-connected Layers

Imagine we have an input image of 32x32, and our network has 5 convolutional layers, each with 5 filters of size 3x3. This is because the filter runs through the image.

This is based on the assumption that if a certain filter is good at detecting something in the position (x, y) of the image, it should also be good for the position (x2,y2).

This assumption is almost always valid because normally we do not know where our features are going to be located in the image, but if for example, we have a dataset in which faces appear centered in the image, we might want the filters to be different for the eye areas than for the nose or the mouth, right?

In this case, if we know where our features are going to be located, it makes more sense to have a filter for each area.

Where before we had to learn 5 filters of 3x3 per layer, which gives us a total of: 5⋅3⋅3=45 parameters, now we would have to learn: 32⋅32⋅5⋅3=46080 parameters.

That’s a huge difference. So unless we know where we want to look for the patterns, that they are going to be different and always in the same position, it is worth using convolutional rather than locally connected layers.

By the way, look at the image below: the layers with the most parameters are the dense ones! It makes sense, in them, all the neurons interconnect with all the neurons in the next layer.

Final Words

As always, I hope you enjoyed the post, and that you gained an intuition about convolutional neural networks!

If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here .

If you want to learn more about Machine Learning, Data Science and Artificial Intelligence follow me on Medium , and stay tuned for my next posts!

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Visual C#2005从入门到精通

夏普 / 周靖 / 清华大学出版社 / 2006-6 / 49.00元

《Visual C#2005从入门到精通/微软技术丛书》：微软技术丛书系列之一，建议一读! Microsoft Visual C#功能强大、使用简单。本书全面介绍了如何利用Visual Studio 2005和.NET Framework来进行C#编程。作者将C#的各种特性娓娓道来，以范例导航的方式，通过大量的练习引导读者逐步构建Windows窗体应用程序，访问Microsoft SQL Serv......一起来看看《Visual C#2005从入门到精通》这本书的介绍吧!

码农工具