内容简介:There are quite a few other non-linear activation functions such as softmax, Parametric ReLU etc. which is not discussed in this article. Now comes the million dollar question! Which activation function is the best? Well, my answer would be
Getting to know Activation Functions in Neural Networks.
What are activation functions in Neural Networks and why should you know about them?
Jun 24 ·5min read
If you are someone who has experience implementing Neural Networks, you might have encountered the term ‘Activation functions’. Does the name ring any bells? No? How about ‘ relu, softmax or sigmoid ’? Well, those are a few of the most widely used activation functions in today’s context. When I started working with Neural Networks I had no idea what an activation function really does. But there was a point where I could not go ahead with the implementation of my neural network without a sound knowledge of activation functions. I did a little bit of digging and here’s what I found…
What are activation functions?
To put it simply, activation functions are mathematical equations that determine the output of neural networks. They basically decide to deactivate neurons or activate them to get the desired output thus the name, activation functions. Now, let’s get into the math…
In a neural network, input data points(x) which are numerical values are fed into neurons. Each and every neuron has a weight(w) which will be multiplied by the inputs and output a certain value which will again be fed into the neurons in the next layer. Activation functions come into the play as mathematical gates in between this process as depicted in figure 1 and decide whether the output of a certain neuron is on or off.
Activation functions can be divided into three main categories; Binary Step Function, Linear Activation Function and Non-Linear Activation functions. However, non-linear activation functions consist of several types of functions. Let’s take a deeper look…
1. Binary Step Function
Binary step function is a threshold-based activation function which means after a certain threshold neuron is activated and below the said threshold neuron is deactivated. In the above graph, the threshold is zero. This activation function can be used in binary classifications as the name suggests, however it can not be used in a situation where you have multiple classes to deal with.
2. Linear Activation Function
Here, our function (Output) is directly proportional to the weighted sum of neurons. Linear Activation function can deal with multiple classes, unlike Binary Step function. However, it has its own drawbacks. With linear activation function changes made in back-propagation will be constant which is not good for learning. Another huge drawback of Linear Activation Function is that no matter how deep the neural network is (how many layers neural network consist of) last layer will always be a function of the first layer. This limits the neural network’s ability to deal with complex problems.
3. Non-Linear Activation Functions
Deep learning practitioners today work with data of high dimensionality such as images, audios, videos, etc. With the drawbacks mentioned above, it is not practical to use Linear Activation Functions in complex applications that we use neural networks for. Therefore, it is Non-Linear Functions that are being widely used in present. We’ll take a look at a few of the popular non-linear activation functions.
- Sigmoid function.
Sigmoid function (also known as logistic function) takes a probabilistic approach and the output ranges between 0–1. It normalizes the output of each neuron. However, Sigmoid function makes almost no change in the prediction for very high or very low inputs which ultimately results in neural network refusing to learn further, this problem is known as the vanishing gradient .
- tanh function
tanh function (also known as hyperbolic tangent) is almost like the sigmoid function but slightly better than that since it’s output ranges between -1 and 1 allowing negative outputs. However, tanh also comes with the vanishing gradient problem just like sigmoid function.
- ReLU (Rectified Linear Unit) function
In this function, outputs for the positive inputs can range from 0 to infinity but when the input is zero or a negative value, the function outputs zero and it hinders with the back-propagation. This problem is known as the dying ReLU problem.
- Leaky ReLU
Leaky ReLU prevents the dying ReLU problem and enable back-propagation. One flaw of Leaky ReLU is the slope being predetermined rather than letting the neural network figure it out.
There are quite a few other non-linear activation functions such as softmax, Parametric ReLU etc. which is not discussed in this article. Now comes the million dollar question! Which activation function is the best? Well, my answer would be it depends … It depends on the problem you are applying the neural network to. For an instance if you are applying a neural network to a classification problem, sigmoid will work well, but for some other problem it might not work well and that is why it is important to learn about pros and cons of activation functions so that you can choose the best activation function to the project that you are working on.
How to include activation functions in your code?
Maybe years ago implementing the math behind all these functions might have been quite difficult but now with the advancement of open-source libraries such as TensorFlow and PyTorch it has become easier! Let’s see a code snippet where activation functions have been included in code using TensorFlow.
Seems quite simple right? As easier it is with TensorFlow, it is important to have an actual understanding of theses activation functions because the learning process of your neural network highly depends on it.
Thank you for reading and hope this article was of help.
Resources
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
HotSpot实战
陈涛 / 人民邮电出版社 / 2014-3 / 69
《HotSpot实战》深入浅出地讲解了HotSpot虚拟机的工作原理,将隐藏在它内部的本质内容逐一呈现在读者面前,包括OpenJDK与HotSpot项目、编译和调试HotSpot的方法、HotSpot内核结构、Launcher、OOP-Klass对象表示系统、链接、运行时数据区、方法区、常量池和常量池Cache、Perf Data、Crash分析方法、转储分析方法、垃圾收集器的设计演进、CMS和G......一起来看看 《HotSpot实战》 这本书的介绍吧!