Batch Normalization In Neural Networks (Code)

栏目: IT技术 · 发布时间: 4年前

内容简介:The first step is to import tools and libraries that will be utilized to either implement or support the implementation of the neural network. The tools that are utilized are as follow:The dataset we’ll be utilizing is the trivialThe fashion-MNIST dataset

Code

The first step is to import tools and libraries that will be utilized to either implement or support the implementation of the neural network. The tools that are utilized are as follow:

  • TensorFlow : An open-source platform for the implementation, training, and deployment of machine learning models.
  • Keras : An open-source library used for the implementation of neural network architectures that run on both CPUs and GPUs.
import tensorflow as tf
from tensorflow import keras

The dataset we’ll be utilizing is the trivial fashion-MNIST dataset .

The fashion-MNIST dataset contains 70,000 images of clothing. More specifically, it includes 60,000 training examples and 10,000 testing examples, that are all grayscale images with dimension 28 x 28 categorized into ten classes.

Preparation of the dataset includes the normalization of the training image and test images by dividing each pixel value by 255.0. This places the pixel value within the range 0 and 1.

A validation portion of the dataset is also created at this stage. This group of the dataset is utilized during training to assess the performance of the network at various iterations.

(train_images, train_labels),  (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
validation_images = train_images[:5000]
validation_labels = train_labels[:5000]

Keras provides tools required to implement the classification model. Keras presents a Sequential API for stacking layers of the neural network in a consecutive manner.

Below is some information on the layers that will be implemented to make up our neural network.

  • Flatten : Takes an input shape and flattens the input image data into a one-dimensional array.
  • Dense : A dense layer has an embedded number of arbitrary units/neurons within. Each neuron is a perceptron.
  • A Perceptron is a fundamental component of an artificial neural network, and it was invented by Frank Rosenblatt in 1958. A perceptron utilizes operations based on the threshold logic unit.
  • Batch Normalization : Batch Normalization layer works by performing a series of operations on the incoming input data. The set of operations involves standardization, normalization, rescaling and shifting of offset of input values coming into the BN layer.
  • Activation Layer : This performs a specified operation on the inputs within the neural network. This layer introduces non -linearity within the network. The model implemented in this article will be utilizing the activation functions: Rectified Linear Unit(ReLU) and softmax .
  • The transformation imposed by ReLU on values from a neuron is represented by the formula y=max(0,x). The ReLU activation function clamps down any negative values from the neuron to 0, and positive values remain unchanged. The result of this mathematical transformation is utilized as the activation of the current layer, and as input to the next.
# Placing batch normalization layer before the activation layers
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28,28]),
    keras.layers.Dense(300, use_bias=False),
    keras.layers.BatchNormalization(),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.Dense(200, use_bias=False),
    keras.layers.BatchNormalization(),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.Dense(100, use_bias=False),
    keras.layers.BatchNormalization(),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.Dense(10, activation=keras.activations.softmax)
])

Let’s take a look at the internal components of a BN layer

Merely accessing the layer at index two will provide information into the variables and their contents within the first BN layer,

model.layers[2].variables

I won’t go into too many details here, but take note of the variable names ‘gamma’, and ‘beta’, the values held within these variables are responsible for the rescaling and offsetting of activations within the layer.

for variable in model.layers[2].variables:
    print(variable.name)>> batch_normalization/gamma:0
>> batch_normalization/beta:0
>> batch_normalization/moving_mean:0
>> batch_normalization/moving_variance:0

Thisarticle goes into more detail in regards to the operations within BN layers.

Within the dense layers, the bias component is set to false. The omission of bias is as a result of the cancellation of constant values that occurs due to mean subtraction during normalization of activations.

Below is a snippet of a twitter post by Andrej Karpathy, current Director of AI at Tesla. His tweet was based on the topic of neural network mistakes that are often made, not setting bias to false when using BN was on the list.

In the next snippet of code we set and specify the optimization algorithm to train the implemented neural network with, along with the loss function and hyperparameters such as learning rate and the number of epochs.

sgd = keras.optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="sparse_categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])

Now we train the network using the model’s sequential API’s ‘ fit ’ method provides the tools to train the implemented network. We will skip some details in regards to how the neural network model is trained. For further information on a detailed explanation on the training and implementation of neural networks, refer to the link below.

model.fit(train_images, train_labels, epochs=60, validation_data=(validation_images, validation_labels))

The evaluation of the model performance is conducted using the test data set aside earlier.

With evaluation results, you can decide to fine-tune the network hyperparameters or move forward to production after observing the accuracy of the evaluation over the test dataset.

model.evaluate(test_images, test_labels)

During the training phase, you might notice that each epoch takes longer to train in comparison to a training a network without batch normalization layers. This is since the batch normalization adds a layer of complexity to the neural network, along with extra parameters required for the model to learn during training.

Although the increase in each epoch time is balanced with the fact that Batch Normalization reduces the time taken for the model to converge to an optimal solution.

The model implemented in this article is too shallow for us to notice the full benefits of utilizing batch normalization within a neural network architecture. Typically, batch normalization is found in deeper convolutional neural networks such as Xception , ResNet50 and Inception V3 .

Extra

  • The neural network implemented above has the Batch Normalization layer just before the activation layers. But it is entirely possible to add BN layers after activation layers.
# Placing batch normalization layer after the activation layers
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28,28]),
    keras.layers.Dense(300, use_bias=False),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(200, use_bias=False),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(100, use_bias=False),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(10, activation=keras.activations.softmax)
])

以上所述就是小编给大家介绍的《Batch Normalization In Neural Networks (Code)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Java常用算法手册

Java常用算法手册

2012-5 / 59.00元

《Java常用算法手册》分三篇,共13章,分别介绍了算法基础、算法应用和算法面试题。首先介绍了算法概述,然后重点分析了数据结构和基本算法思想;接着,详细讲解了算法在排序、查找、数学计算、数论、历史趣题、游戏、密码学等领域中的应用;最后,列举了算法的一些常见面试题。书中知识点覆盖全面,结构安排紧凑,讲解详细,实例丰富。全书对每一个知识点都给出了相应的算法及应用实例,虽然这些例子都是以Java语言来编......一起来看看 《Java常用算法手册》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具