Classical Neural Network: What really are Nodes and Layers?

栏目: IT技术 · 发布时间: 6年前

内容简介:We have spoken previously about activation functions, and as promised we will explain its link with the layers and the nodes in an architecture of neural networks.Note that this is an explanation for classical Neural Network and not specialized ones. This
rights: source

Classical Neural Network: What really are Nodes and Layers?

What does a node and a layer mathematically represent? Easy to understand introduction to concepts behind the scene.

We have spoken previously about activation functions, and as promised we will explain its link with the layers and the nodes in an architecture of neural networks.

Note that this is an explanation for classical Neural Network and not specialized ones. This knowledge will despite it, be of use when studying specific neural networks.

Alright, all being said, let’s get started.

First, we will be taking as an example the following very simple architecture of neural network (NN). (fig. 1)

fig.1 (rights: source )
  • Input Layer: Node1 → X | Activation: sigmoid
  • Hidden Layer: Node1 →N1 and Node2 → N2 (from top to bottom) | Activation: sigmoid
  • Output Layer: Node1 → M | Activation
#This code is the keras implementation of the above described NNdef simple_nn():
 model = Sequential()
 model.add(Dense(2, input_dim=1, activation='sigmoid'))
 model.add(Dense(1, activation='sigmoid'))
 model.compile(loss='mean_squared_error', optimizer='sgd')
 return model

Which function does this Neural Network represent?

Given the above notations, we get the following function (fig.2):

fig.2 (rights: own image)

Here multiple things are to be noticed:

  • The output of the neural network will always belong to [0,1]. As we mentioned in the activation functions article, the output layer activation function is very much important and pretty much defines the type of model you want to achieve (i.e classification/regression etc…)
  • With only one hidden layer composed of two nodes, we end up with a vector of weights of dimensionality 7. That puts in perspective of difficult the training is when the number of nodes increases.
  • Except for activation functions, operations are linear combinations. Again, activation functions introduce non-linearity.

Why exactly are we using linear combination and these types of activation functions?

First of all, even though Deep Learning (studies of numerous layers NN) is a category of study by itself, it nonetheless has the same goal as classical Machine Learning: “approaching a specific underlying model/distribution from data points (most of the times)”. Therefore the goal of a NN is also to approach a distribution i.e a function, but then how so? Here intervenes some basic knowledge about Analysis, brace yourself!

For simplicity’s sake (you can contact me if you are interested to know the more general explanation), we will be changing to the following architecture. Here is its function (fig.3):

fig.3 (rights: own image)

Let’s first start with a continuous function from reals to reals. Let’s fix ourselves the goal to approach such a function. A canonical way of starting this is to first plot the function our NN represents (fig.3). Since we do not need any specific data to explain the idea, we will not be training the NN and will simply be assigning weights arbitrarily (fig.4).

fig.4 (rights: own image)

Here is the plot (fig.5):

fig.5 (rights: own image)

Surprise! What kind of shape is this? A rectangle !!!

So our Neural network is actually mimicking the distribution of a rectangle (more or less). For some, this might not seem special, but for some others that have heard of Riemman Integration and step functions, for instance, will more or less see where I am heading to. Exactly, an approximation of the continuous function by step functions like neural network (not exactly a step function but summing up does the job).

Few things again to note:

  • The sharpness of the edges for the rectangle is defined by the scalar in front of X and the position of the high-value derivative is defined by the scalar added to the product.
  • Two nodes are enough to make a rectangle, and therefore to approach the continuous function, we simply need to add nodes! (in case of an odd number of nodes, we will simply have rectangles and a step function)

Even though it is harder to picture, the approximation of continuous function in higher dimensions works pretty much the same (except for one step where we rescale values).

Now that we have seen such approximations, we can be confident in the power of NN to approach a distribution (even if non-continuous). But this explanation still lacks something, we arbitrarily gave some accurate weights to our NN, but unfortunately, we are unable to do such a thing in general datasets since we ignore the distribution. And here intervenes optimization techniques such as the famous SGD (Stochastic Gradient Descent) or Batch GD (etc…). Assuming that these optimizations do get us to a close enough solution, this will imply that we overperformed the original weights (for rectangles) that we were giving. Showing the example above (rectangles) is somehow giving a lower bound to accuracy, even though the mentioned technique seemed optimal, the optimization of weights might not necessarily converge toward the technique’s optimal but will again, outperform it.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

CSS商业网站布局之道

CSS商业网站布局之道

朱印宏 / 清华大学出版社 / 2007-1 / 75.00元

本书是一本CSS技术专著。 主要从布局角度全面、系统和深入地讲解CSS在标准网站布局之中的应用。很多读者经过初步的学习之后就能够使用CSS设计出一些漂亮的网页样式,于是便乐在其中,踌躇满志,这是好事,但千万不要自我陶醉,因为你还未领略CSS的博大精深。用CSS容易,难的是全部都用CSS。CSS的精髓是布局,而不是样式,布局是需要缜密的逻辑思维和系统设计的,而样式只需要简单地编写代码或复制即可。本书......一起来看看 《CSS商业网站布局之道》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

在线进制转换器
在线进制转换器

各进制数互转换器

URL 编码/解码
URL 编码/解码

URL 编码/解码