Clearly explained: What, why and how of feature scaling- normalization & standardization

栏目: IT技术 · 发布时间: 4年前

内容简介:You might be surprised at the choice of the cover image for this post but this is how we can understand Normalization! This mighty concept helps us when we have data that has a variety of features having different measurement scales and thus leaving us in

Why Normalization?

You might be surprised at the choice of the cover image for this post but this is how we can understand Normalization! This mighty concept helps us when we have data that has a variety of features having different measurement scales and thus leaving us in a lurch when we try to derive insights from such data or try to fit a model on such data.

Much like we can’t compare the different fruits shown in the above picture on a common scale, we can’t work efficiently with data that has too many scales.

For example: See the image below and observe the scales of salary Vs Work experience Vs Band level. Due to the higher scale range of the attribute Salary, it can take precedence over the other two attributes while training the model, despite whether or not it actually holds more weight in predicting the dependent variable.
Clearly explained: What, why and how of feature scaling- normalization & standardization

Thus, in the data pre-processing stage of data mining and model development (Statistical or Machine learning), it's a good practice to normalize all the variables to bring them down to a similar scale — If they are of different ranges .

Normalization is not required for every dataset, you have to sift through it and make sure if your data requires it and only then continue to incorporate this step in your procedure. Also, you should apply Normalization if you are not very sure if the data distribution is Gaussian/ Normal/ bell-curve in nature. Normalization will help in reducing the impact of non-gaussian attributes on your model.

What is Normalization?

We’ll talk about two case scenarios here:

1. Your data doesn’t follow Normal/ Gaussian distribution (Prefer this in case of doubt also)

Data normalization, in this case, is the process of rescaling one or more attributes to the range of 0 to 1. This means that the largest value for each attribute is 1 and the smallest value is 0.

It is also known as Min-Max scaling.
Clearly explained: What, why and how of feature scaling- normalization & standardization
Formula of Min-Max scaling — Source: Wikipedia
Source: Wikipedia

2. Your data follows Gaussian distribution

In this case, Normalization can be done by the formula described below where mu is the mean and the sigma is the standard deviation of your sample/population.

When we normalize using the Standard score as given below, it’s also commonly known as Standardization or Z-Score.
Clearly explained: What, why and how of feature scaling- normalization & standardization
Formula of Standardization/Z-Score — Source: Wikipedia
Source: Wikipedia

More about Z-Score

The z score tells us how many standard deviations away from the mean your score is.

For example —

  • Z-score of 1.5 then it implies it’s 1.5 standard deviations above the mean.
  • Z-score of -0.8 indicates our value is 0.8 standard deviations below the mean.

As explained above, the z-score tells us where the score lies on anormal distribution curve. A z-score of zero tells us the value is exactly the mean/ average while a score of +3 tells you that the value is much higher than average (probably an outlier)

If you refer to my article onNormal distributions, you’ll quickly understand that Z-score is converting our distribution to a Standard Normal Distribution with a mean of 0 and a Standard deviation of 1.

Interpretation of Z-Score

Let’s quickly understand how to interpret a value of Z-score in terms of AUC (Area under the curve).

Clearly explained: What, why and how of feature scaling- normalization & standardization

According to the Empirical rule, discussed in detail in the article on Normal distributions linked above and stated at the end of this post too, it’s stated that:

  • 68% of the data lies between +1SD and -1SD
  • 99.5% of the data lies between +2SD and -2SD
  • 99.7% of the data lies between +3SD and -3SD

Now, if we want to look at a customized range and calculate how much data that segments covers, then Z-scores come to our rescue. Let’s see how.

For example, we want to know how much percentage of data is covered (probability of occurrence of a data point) between negative extreme on the left and -1SD, we have to refer to Z-score table linked below:

Z-Score Table

Now, we have to look for value -1.00 and we can see from the snapshot below that is states 15.8% as the answer to our question.

Similarly, if we would have been looking for -1.25, we would have got the value as 10.56% (-1.2 in the column Z and match across the column 0.05 to make -1.25)

Clearly explained: What, why and how of feature scaling- normalization & standardization

Source: Link

Common Z-score values and their results from Z-score table which indicates how much are is covered between the negative extreme end and the point of Z-score taken, i.e. Area to the left of a Z-score point:

Clearly explained: What, why and how of feature scaling- normalization & standardization

We can use these values to calculate between customized ranges as well, For example: If we want to the AUC between -3 and -2.5 Z-score values, it will be (0.62–1.13)%= 0.49% ~0.5%. Thus, this comes in very handy when it comes to problems that do not have straightforward Z-score values to be interpreted.

Real Life Interpretation example

Let’s say we have an IQ score data for a sample that we have normalized using the Z-score. Now to put things into perspective, if a person’s IQ Z-score value is 2 — We see that +2 corresponds to 97.72% on Z-score table, this implies that his/her IQ is better than 97.72% people or his/her IQ is lesser than only 2.28% people implying the person you picked up is really smart!!

This can be applied to almost every use case (weights, heights, salaries, immunity levels, and what not!)

In case of Confusion between Normalization and Standardization

If you have a use case in which you are not readily able to decide which will be good for your model, then you should run two iterations, one with Normalization (Min-max scaling) and another with Standardization(Z-score) and then plot the curves either by using a box-plot visualization to compare which technique is performing better for you or best yet, fit your model to these two versions and the judge using the model validation metrics.

Should we apply Normalization while using Machine learning algorithms?

Contrary to the popular belief that ML algorithms do not require Normalization, you should first take a good look at the technique that your algorithm is using to make a sound decision that favors the model you are developing.

If you are using a Decision Tree, or for that matter any tree-based algorithm, then you can proceed WITHOUT Normalization because the fundamental concept of a tree revolves around making a decision at a node based on a SINGLE feature at a time, thus the difference in scales of different features will not impact a Tree-based algorithm.
Whereas, if you are using Linear Regression, Logistic Regression, Neural networks, SVM, K-NN, K-Means or any other distance-based algorithm or gradient descent based algorithm, then all of these algorithms are sensitive to the range of scales of your features and applying Normalization will enhance the accuracy of these ML algorithms.

That’s all about Feature Scaling:)

Happy Learning, happy growing:)

The article on normal distributions that I referred to above in this post:

Watch out this space for more on Machine learning, data analytics, and statistics!


以上所述就是小编给大家介绍的《Clearly explained: What, why and how of feature scaling- normalization & standardization》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

系统程序员成长计划

系统程序员成长计划

李先静 / 人民邮电出版社 / 2010-04 / 45.00

在学习程序开发的过程中,你是否总是为自己遇到的一些问题头疼不已,你是否还在为写不出代码而心急如焚?作为软件开发人员,你是否时时为自己如何成为一名合格的程序员而困惑不已?没关系,本书将为你排忧解难。 这是一本介绍系统程序开发方法的书。书中结合内容详尽的代码细致讲述了不少底层程序开发基础知识,并在逐步深入的过程中介绍了一些简单实用的应用程序,最后还讲述了一些软件工程方面的内容,内容全面,语言生动......一起来看看 《系统程序员成长计划》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具