Getting to know Activation Functions in Neural Networks.

栏目: IT技术 · 发布时间: 4年前

内容简介:There are quite a few other non-linear activation functions such as softmax, Parametric ReLU etc. which is not discussed in this article. Now comes the million dollar question! Which activation function is the best? Well, my answer would be

Getting to know Activation Functions in Neural Networks.

What are activation functions in Neural Networks and why should you know about them?

Getting to know Activation Functions in Neural Networks.

Photo by Marius Masalar onUnsplash

If you are someone who has experience implementing Neural Networks, you might have encountered the term ‘Activation functions’. Does the name ring any bells? No? How about ‘ relu, softmax or sigmoid ’? Well, those are a few of the most widely used activation functions in today’s context. When I started working with Neural Networks I had no idea what an activation function really does. But there was a point where I could not go ahead with the implementation of my neural network without a sound knowledge of activation functions. I did a little bit of digging and here’s what I found…

What are activation functions?

To put it simply, activation functions are mathematical equations that determine the output of neural networks. They basically decide to deactivate neurons or activate them to get the desired output thus the name, activation functions. Now, let’s get into the math…

Getting to know Activation Functions in Neural Networks.

Figure 1

In a neural network, input data points(x) which are numerical values are fed into neurons. Each and every neuron has a weight(w) which will be multiplied by the inputs and output a certain value which will again be fed into the neurons in the next layer. Activation functions come into the play as mathematical gates in between this process as depicted in figure 1 and decide whether the output of a certain neuron is on or off.

Activation functions can be divided into three main categories; Binary Step Function, Linear Activation Function and Non-Linear Activation functions. However, non-linear activation functions consist of several types of functions. Let’s take a deeper look…

1. Binary Step Function

Getting to know Activation Functions in Neural Networks.

Binary Step Activation Function

Binary step function is a threshold-based activation function which means after a certain threshold neuron is activated and below the said threshold neuron is deactivated. In the above graph, the threshold is zero. This activation function can be used in binary classifications as the name suggests, however it can not be used in a situation where you have multiple classes to deal with.

2. Linear Activation Function

Getting to know Activation Functions in Neural Networks.

Linear Activation Function

Here, our function (Output) is directly proportional to the weighted sum of neurons. Linear Activation function can deal with multiple classes, unlike Binary Step function. However, it has its own drawbacks. With linear activation function changes made in back-propagation will be constant which is not good for learning. Another huge drawback of Linear Activation Function is that no matter how deep the neural network is (how many layers neural network consist of) last layer will always be a function of the first layer. This limits the neural network’s ability to deal with complex problems.

3. Non-Linear Activation Functions

Deep learning practitioners today work with data of high dimensionality such as images, audios, videos, etc. With the drawbacks mentioned above, it is not practical to use Linear Activation Functions in complex applications that we use neural networks for. Therefore, it is Non-Linear Functions that are being widely used in present. We’ll take a look at a few of the popular non-linear activation functions.

  • Sigmoid function.

Getting to know Activation Functions in Neural Networks.

Sigmoid function

Sigmoid function (also known as logistic function) takes a probabilistic approach and the output ranges between 0–1. It normalizes the output of each neuron. However, Sigmoid function makes almost no change in the prediction for very high or very low inputs which ultimately results in neural network refusing to learn further, this problem is known as the vanishing gradient .

  • tanh function

Getting to know Activation Functions in Neural Networks.

tanh function

tanh function (also known as hyperbolic tangent) is almost like the sigmoid function but slightly better than that since it’s output ranges between -1 and 1 allowing negative outputs. However, tanh also comes with the vanishing gradient problem just like sigmoid function.

  • ReLU (Rectified Linear Unit) function

Getting to know Activation Functions in Neural Networks.

ReLU function

In this function, outputs for the positive inputs can range from 0 to infinity but when the input is zero or a negative value, the function outputs zero and it hinders with the back-propagation. This problem is known as the dying ReLU problem.

  • Leaky ReLU

Getting to know Activation Functions in Neural Networks.

Leaky ReLU function

Leaky ReLU prevents the dying ReLU problem and enable back-propagation. One flaw of Leaky ReLU is the slope being predetermined rather than letting the neural network figure it out.

There are quite a few other non-linear activation functions such as softmax, Parametric ReLU etc. which is not discussed in this article. Now comes the million dollar question! Which activation function is the best? Well, my answer would be it depends … It depends on the problem you are applying the neural network to. For an instance if you are applying a neural network to a classification problem, sigmoid will work well, but for some other problem it might not work well and that is why it is important to learn about pros and cons of activation functions so that you can choose the best activation function to the project that you are working on.

How to include activation functions in your code?

Maybe years ago implementing the math behind all these functions might have been quite difficult but now with the advancement of open-source libraries such as TensorFlow and PyTorch it has become easier! Let’s see a code snippet where activation functions have been included in code using TensorFlow.

Activation functions in TensorFlow

Seems quite simple right? As easier it is with TensorFlow, it is important to have an actual understanding of theses activation functions because the learning process of your neural network highly depends on it.

Thank you for reading and hope this article was of help.

Resources


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据压缩导论

数据压缩导论

萨尤得 / 2009-2 / 99.00元

《数据压缩导论(英文版·第3版)》是数据压缩方面的经典著作,介绍了各种类型的压缩模式。书中首先介绍了基本压缩方法(包括无损压缩和有损压缩)中涉及的数学知识,为常见的压缩形式打牢了信息论基础,然后从无损压缩体制开始,依次讲述了霍夫曼编码、算术编码以及字典编码技术等,对于有损压缩,还讨论了使用量化的模式,描述了标量、矢量以及微分编码和分形压缩技术,最后重点介绍了视频加密。《数据压缩导论(英文版·第3版......一起来看看 《数据压缩导论》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

随机密码生成器
随机密码生成器

多种字符组合密码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码