Curse of Batch Normalization

栏目: IT技术 · 发布时间: 5年前

内容简介:Batch Normalization is Indeed one of the major breakthrough in the field of Deep Learning and is one of the hot topics for discussion among researchers in the past few years. Batch Normalization is a widely adopted technique that enables faster and more st

What are some drawbacks of using batch normalization?

May 15 ·6min read

Curse of Batch Normalization

Photo by Freddie Collins on Unsplash

Batch Normalization is Indeed one of the major breakthrough in the field of Deep Learning and is one of the hot topics for discussion among researchers in the past few years. Batch Normalization is a widely adopted technique that enables faster and more stable training and has become one of the most influential methods. However, despite its versatility, there are still some points holding this method back as we are going to discuss in this article, which shows that there’s still room for improvement for normalization methods.

Why do we use Batch Normalization?

Before discussing anything, first, we should know what batch normalization is, how it works, and discuss it’s use cases.

What Batch Normalization is

During training, the output distribution of each intermediate activation layer shifts at each iteration as we update the previous weights. This phenomenon is referred to as an internal covariant shift (ICS). So a natural thing to do, if I want to prevent this from happening, is to fix all the distributions. In simple words, if I had some problem that my distributions are shifting around, ill just clamp them and not let them shift around to help gradient optimization and prevent vanishing gradients, and this will help my neural network train faster. So reducing this internal covariant shift was the key principle driving the development of batch normalization.

How it works

Batch Normalization normalizes the output of the previous output layer by subtracting the empirical mean over the batch divided by the empirical standard deviation. This will help the data look like Gaussian distribution .

Curse of Batch Normalization

Where mu and sigma_square are the batch mean and batch variance respectively.

Curse of Batch Normalization

And, we learn a new mean and covariance in terms of two learnable parameters γ and β. So in short, you can think of batch normalization is something that helps you control the first and second moments of the distribution of the batch.

Curse of Batch Normalization

Feature distribution output from an intermediate convolution layer from VGG-16 Network. 1. (Before) without any normalization, 2. (After) applying batch normalization.

Benefits

I’ll enlist some of the benefits of using batch normalization but I won’t get into much detail, as there are tonnes of articles already covering that.

  • Faster convergence.
  • Decreases the importance of initial weights.
  • Robust to hyperparameters.
  • Requires less data for generalization.

Curse of Batch Normalization

1. Faster Convergence, 2. Robust to hyperparameters

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

运营笔记

运营笔记

类延昊 / 天津人民版社 / 2016-12-1 / CNY 39.80

运营是入门浅但学问深的行当。一个入门很久的人不见得能在11年内爬到塔尖,同样一个初入龙门的人占据高位也不见得非用11年。 到底该怎么做运营?如何做运营才不至于让自己忙死累死甚至茫然不知所措?如何和用户进行有效沟通?如何把握住处于塔尖20%的核心用户?如何强敌逼阵时快速找到突破口?如何挤破头皮提高转化率? 在这本书里,类类以自己常年战斗在一线摸爬滚打的经验给予了有效而真诚的解答。一起来看看 《运营笔记》 这本书的介绍吧!

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具