Getting to know Activation Functions in Neural Networks.

栏目: IT技术 · 发布时间: 4年前

内容简介：There are quite a few other non-linear activation functions such as softmax, Parametric ReLU etc. which is not discussed in this article. Now comes the million dollar question! Which activation function is the best? Well, my answer would be

Getting to know Activation Functions in Neural Networks.

What are activation functions in Neural Networks and why should you know about them?

Hasara Samson

Jun 24 ·5min read

Getting to know Activation Functions in Neural Networks. — Photo by Marius Masalar onUnsplash

If you are someone who has experience implementing Neural Networks, you might have encountered the term ‘Activation functions’. Does the name ring any bells? No? How about ‘ relu, softmax or sigmoid ’? Well, those are a few of the most widely used activation functions in today’s context. When I started working with Neural Networks I had no idea what an activation function really does. But there was a point where I could not go ahead with the implementation of my neural network without a sound knowledge of activation functions. I did a little bit of digging and here’s what I found…

What are activation functions?

To put it simply, activation functions are mathematical equations that determine the output of neural networks. They basically decide to deactivate neurons or activate them to get the desired output thus the name, activation functions. Now, let’s get into the math…

In a neural network, input data points(x) which are numerical values are fed into neurons. Each and every neuron has a weight(w) which will be multiplied by the inputs and output a certain value which will again be fed into the neurons in the next layer. Activation functions come into the play as mathematical gates in between this process as depicted in figure 1 and decide whether the output of a certain neuron is on or off.

Activation functions can be divided into three main categories; Binary Step Function, Linear Activation Function and Non-Linear Activation functions. However, non-linear activation functions consist of several types of functions. Let’s take a deeper look…

1. Binary Step Function

Binary step function is a threshold-based activation function which means after a certain threshold neuron is activated and below the said threshold neuron is deactivated. In the above graph, the threshold is zero. This activation function can be used in binary classifications as the name suggests, however it can not be used in a situation where you have multiple classes to deal with.

2. Linear Activation Function

Here, our function (Output) is directly proportional to the weighted sum of neurons. Linear Activation function can deal with multiple classes, unlike Binary Step function. However, it has its own drawbacks. With linear activation function changes made in back-propagation will be constant which is not good for learning. Another huge drawback of Linear Activation Function is that no matter how deep the neural network is (how many layers neural network consist of) last layer will always be a function of the first layer. This limits the neural network’s ability to deal with complex problems.

3. Non-Linear Activation Functions

Deep learning practitioners today work with data of high dimensionality such as images, audios, videos, etc. With the drawbacks mentioned above, it is not practical to use Linear Activation Functions in complex applications that we use neural networks for. Therefore, it is Non-Linear Functions that are being widely used in present. We’ll take a look at a few of the popular non-linear activation functions.

Sigmoid function.

Sigmoid function (also known as logistic function) takes a probabilistic approach and the output ranges between 0–1. It normalizes the output of each neuron. However, Sigmoid function makes almost no change in the prediction for very high or very low inputs which ultimately results in neural network refusing to learn further, this problem is known as the vanishing gradient .

tanh function

tanh function (also known as hyperbolic tangent) is almost like the sigmoid function but slightly better than that since it’s output ranges between -1 and 1 allowing negative outputs. However, tanh also comes with the vanishing gradient problem just like sigmoid function.

ReLU (Rectified Linear Unit) function

In this function, outputs for the positive inputs can range from 0 to infinity but when the input is zero or a negative value, the function outputs zero and it hinders with the back-propagation. This problem is known as the dying ReLU problem.

Leaky ReLU

Leaky ReLU prevents the dying ReLU problem and enable back-propagation. One flaw of Leaky ReLU is the slope being predetermined rather than letting the neural network figure it out.

There are quite a few other non-linear activation functions such as softmax, Parametric ReLU etc. which is not discussed in this article. Now comes the million dollar question! Which activation function is the best? Well, my answer would be it depends … It depends on the problem you are applying the neural network to. For an instance if you are applying a neural network to a classification problem, sigmoid will work well, but for some other problem it might not work well and that is why it is important to learn about pros and cons of activation functions so that you can choose the best activation function to the project that you are working on.

How to include activation functions in your code?

Maybe years ago implementing the math behind all these functions might have been quite difficult but now with the advancement of open-source libraries such as TensorFlow and PyTorch it has become easier! Let’s see a code snippet where activation functions have been included in code using TensorFlow.

Activation functions in TensorFlow

Seems quite simple right? As easier it is with TensorFlow, it is important to have an actual understanding of theses activation functions because the learning process of your neural network highly depends on it.

Thank you for reading and hope this article was of help.

Resources

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Getting to know Activation Functions in Neural Networks.

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

数据结构与算法分析

韦斯 (Mark Allen Weiss) / 陈越 / 机械工业出版社 / 2016-3-1 / 69.00元

本书是国外数据结构与算法分析方面的经典教材，使用卓越的Java编程语言作为实现工具讨论了数据结构(组织大量数据的方法)和算法分析(对算法运行时间的估计)。本书把算法分析与有效率的Java程序的开发有机地结合起来，深入分析每种算法，内容全面、缜密严格，并细致讲解精心构造程序的方法。一起来看看《数据结构与算法分析》这本书的介绍吧!

码农工具