A Mathematical Explanation of Naive Bayes in 5 Minutes

栏目: IT技术 · 发布时间: 4年前

内容简介：Naive Bayes. What may seem like a very confusing algorithm is actually one of the simplest algorithms once understood. Part of why it’s so simple to understand and implement is because of the assumptions that it inherently makes. However, that’s not to say

A Mathematical Explanation of Naive Bayes in 5 Minutes

A thorough explanation of Naive Bayes with an example

Photo by Courtney Cook on Unsplash

Naive Bayes. What may seem like a very confusing algorithm is actually one of the simplest algorithms once understood. Part of why it’s so simple to understand and implement is because of the assumptions that it inherently makes. However, that’s not to say that it’s a poor algorithm despite the strong assumptions that it holds — in fact, Naive Bayes is widely used in the data science world and has a lot of real-life applications.

In this article, we’ll look at what Naive Bayes is, how it works with an example to make it easy to understand, the different types of Naive Bayes, the pros and cons, and some real-life applications of it.

Preliminary Knowledge

In order to understand Naive Bayes and get as much value out of this article, it’s expected that you have a basic understanding of the following concepts:

Conditional probability : a measure of the probability of event A occurring given that another event has occurred. For example, “what is the probability that it will rain given that it is cloudy?” is an example of conditional probability.
Joint Probability : a measure that calculates the likelihood of two or more events occurring at the same time.
Proportionality : refers to the relationship between two quantities that are multiplicatively connected to a constant, or in simpler terms, whether their ratio yields a constant.
Bayes Theorem : according to Wikipedia, Bayes’ Theorem describes the probability of an event (posterior) based on the prior knowledge of conditions that might be related to the event.

What is Naive Bayes?

Naive Bayes is a machine learning algorithm, but more specifically, it is a classification technique. This means that Naive Bayes is used when the output variable is discrete. The underlying mechanics of the algorithm are driven by the Bayes Theorem, which you’ll see in the next section.

How Naive Bayes Works

First, I’m going to walk through the theory behind Naive Bayes, and then solidify these concepts with an example to make it easier to understand.

The Naive Bayes Classifier is inspired by Bayes Theorem which states the following equation:

This equation can be rewritten using X (input variables) and y (output variable) to make it easier to understand. In plain English, this equation is solving for the probability of y given input features X.

Because of the naive assumption (hence the name) that variables are independent given the class, we can rewrite P(X|y) as follows:

Also, since we are solving for y, P(X) is a constant which means that we can remove it from the equation and introduce a proportionality. This leads us to the following equation:

Now that we’ve arrived at this equation, the goal of Naive Bayes is to choose the class y with the maximum probability. Argmax is simply an operation that finds the argument that gives the maximum value from a target function. In this case, we want to find the maximum y value.

Now let’s go through an example so that you can make more sense out of this algorithm.

Example of Naive Bayes

Suppose you tracked the weather conditions for 14 days and based on the weather conditions, you decided whether to play golf or not play golf.

First, we need to convert this into a frequency table, so that we can get the values of P(X|y) and P(X). Recall that we are solving for P(y|X) :

Second, we want to convert the frequencies into ratios or conditional probabilities:

Finally, we can use the proportionality equation to predict y, given X.

Imagine that X = {outlook: sunny, temperature: mild, humidity: normal, windy: false}.

First, we’ll calculate the probability that you will play golf given X, P(yes|X) followed by the probability that you won’t play golf given X, P(no|X).

Using the chart above, we can get the following information:

Now we can simply input this information into the following formula:

Similarly, you would complete the same sequence of steps for P(no|X).

Since P(yes|X) > P(no|X), then you can predict that this person would play golf given that the outlook is sunny, the temperature is mild, the humidity is normal and it’s not windy.

TLDR

To synthesize what we just did…

First, we created a frequency table and then a ratio table so that we could get the values for P(X) and P(y|X)
Then for a given set of input features X, we computed the proportionality of P(y|X) for each class y. In our example, we had two classes, yes and no.
Lastly, we took the highest value of P(y|X) of all classes to predict which outcome was the most likely.

Types of Naive Bayes

There are three main types of Naive Bayes that are used in practice:

Multinomial

Multinomial Naive Bayes assumes that each P(xn|y) follows a multinomial distribution. It is mainly used in document classification problems and looks at the frequency of words, similar to the example above.

Bernoulli

Bernoulli Naive Bayes is similar to Multinomial Naive Bayes, except that the predictors are boolean (True/False), like the “Windy” variable in the example above.

Gaussian

Gaussian Naive Bayes assumes that continuous values are sampled from a gaussian distribution and assumes the following:

Pros and Cons of Naive Bayes

Pros

As shown above, it is quite intuitive once you understand the concept
It’s easy to implement and performs well in multiclass prediction
It works well with categorical input variables

Cons

You can encounter the zero-frequency problem when there’s a category in the test set that’s not in the training set (although there are workarounds for this)
The probability estimates are not the most trustworthy from this algorithm
Naive Bayes holds strong assumptions, as discussed above.

Naive Bayes Applications

Below are some popular applications that Naive Bayes is used for:

Real-time prediction : Because Naive Bayes is fast and it’s based on Bayesian statistics, it works well at making predictions in real-time. In fact, a lot of popular real-time models or online models are based on Bayesian statistics.
Multiclass prediction : As previously stated, Naive Bayes works well when there are more than two classes for the output variable.
Text classification : Text classification also includes sub-applications like spam filtering and sentiment analysis. Since Naive Bayes works best with discrete variables, it tends to work well in these applications.
Recommendation systems : Naive Bayes is commonly used alongside other algorithms like Collaborative Filtering to build recommendations systems like Netflix’s recommended for you section, or Amazon’s recommended products, or Spotify’s recommended songs.

Thanks for Reading!

Terence Shin

Founder of ShinTwin | Let’s connect on LinkedIn | Project Portfolio is here .

以上所述就是小编给大家介绍的《A Mathematical Explanation of Naive Bayes in 5 Minutes》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

A Mathematical Explanation of Naive Bayes in 5 Minutes

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

代码里的世界观——通往架构师之路

余叶 / 人民邮电出版社 / 2018-11 / 59.00元

本书分为两大部分，第一部分讲述程序员在编写程序和组织代码时遇到的很多通用概念和共同问题，比如程序里的基本元素，如何面向对象，如何面向抽象编程，什么是耦合，如何进行单元测试等。第二部分讲述程序员在编写代码时都会遇到的思考和选择，比如程序员的两种工作模式，如何坚持技术成长，程序员的组织生产方法，程序员的职业生涯规划等。一起来看看《代码里的世界观——通往架构师之路》这本书的介绍吧!

码农工具

A Mathematical Explanation of Naive Bayes in 5 Minutes