内容简介:Naive Bayes. What may seem like a very confusing algorithm is actually one of the simplest algorithms once understood. Part of why it’s so simple to understand and implement is because of the assumptions that it inherently makes. However, that’s not to say
A Mathematical Explanation of Naive Bayes in 5 Minutes
A thorough explanation of Naive Bayes with an example
Naive Bayes. What may seem like a very confusing algorithm is actually one of the simplest algorithms once understood. Part of why it’s so simple to understand and implement is because of the assumptions that it inherently makes. However, that’s not to say that it’s a poor algorithm despite the strong assumptions that it holds — in fact, Naive Bayes is widely used in the data science world and has a lot of real-life applications.
In this article, we’ll look at what Naive Bayes is, how it works with an example to make it easy to understand, the different types of Naive Bayes, the pros and cons, and some real-life applications of it.
Preliminary Knowledge
In order to understand Naive Bayes and get as much value out of this article, it’s expected that you have a basic understanding of the following concepts:
- Conditional probability : a measure of the probability of event A occurring given that another event has occurred. For example, “what is the probability that it will rain given that it is cloudy?” is an example of conditional probability.
- Joint Probability : a measure that calculates the likelihood of two or more events occurring at the same time.
- Proportionality : refers to the relationship between two quantities that are multiplicatively connected to a constant, or in simpler terms, whether their ratio yields a constant.
- Bayes Theorem : according to Wikipedia, Bayes’ Theorem describes the probability of an event (posterior) based on the prior knowledge of conditions that might be related to the event.
What is Naive Bayes?
Naive Bayes is a machine learning algorithm, but more specifically, it is a classification technique. This means that Naive Bayes is used when the output variable is discrete. The underlying mechanics of the algorithm are driven by the Bayes Theorem, which you’ll see in the next section.
How Naive Bayes Works
First, I’m going to walk through the theory behind Naive Bayes, and then solidify these concepts with an example to make it easier to understand.
The Naive Bayes Classifier is inspired by Bayes Theorem which states the following equation:
This equation can be rewritten using X (input variables) and y (output variable) to make it easier to understand. In plain English, this equation is solving for the probability of y given input features X.
Because of the naive assumption (hence the name) that variables are independent given the class, we can rewrite P(X|y) as follows:
Also, since we are solving for y, P(X) is a constant which means that we can remove it from the equation and introduce a proportionality. This leads us to the following equation:
Now that we’ve arrived at this equation, the goal of Naive Bayes is to choose the class y with the maximum probability. Argmax is simply an operation that finds the argument that gives the maximum value from a target function. In this case, we want to find the maximum y value.
Now let’s go through an example so that you can make more sense out of this algorithm.
Example of Naive Bayes
Suppose you tracked the weather conditions for 14 days and based on the weather conditions, you decided whether to play golf or not play golf.
First, we need to convert this into a frequency table, so that we can get the values of P(X|y) and P(X). Recall that we are solving for P(y|X) :
Second, we want to convert the frequencies into ratios or conditional probabilities:
Finally, we can use the proportionality equation to predict y, given X.
Imagine that X = {outlook: sunny, temperature: mild, humidity: normal, windy: false}.
First, we’ll calculate the probability that you will play golf given X, P(yes|X) followed by the probability that you won’t play golf given X, P(no|X).
Using the chart above, we can get the following information:
Now we can simply input this information into the following formula:
Similarly, you would complete the same sequence of steps for P(no|X).
Since P(yes|X) > P(no|X), then you can predict that this person would play golf given that the outlook is sunny, the temperature is mild, the humidity is normal and it’s not windy.
TLDR
To synthesize what we just did…
- First, we created a frequency table and then a ratio table so that we could get the values for P(X) and P(y|X)
- Then for a given set of input features X, we computed the proportionality of P(y|X) for each class y. In our example, we had two classes, yes and no.
- Lastly, we took the highest value of P(y|X) of all classes to predict which outcome was the most likely.
Types of Naive Bayes
There are three main types of Naive Bayes that are used in practice:
Multinomial
Multinomial Naive Bayes assumes that each P(xn|y) follows a multinomial distribution. It is mainly used in document classification problems and looks at the frequency of words, similar to the example above.
Bernoulli
Bernoulli Naive Bayes is similar to Multinomial Naive Bayes, except that the predictors are boolean (True/False), like the “Windy” variable in the example above.
Gaussian
Gaussian Naive Bayes assumes that continuous values are sampled from a gaussian distribution and assumes the following:
Pros and Cons of Naive Bayes
Pros
- As shown above, it is quite intuitive once you understand the concept
- It’s easy to implement and performs well in multiclass prediction
- It works well with categorical input variables
Cons
- You can encounter the zero-frequency problem when there’s a category in the test set that’s not in the training set (although there are workarounds for this)
- The probability estimates are not the most trustworthy from this algorithm
- Naive Bayes holds strong assumptions, as discussed above.
Naive Bayes Applications
Below are some popular applications that Naive Bayes is used for:
- Real-time prediction : Because Naive Bayes is fast and it’s based on Bayesian statistics, it works well at making predictions in real-time. In fact, a lot of popular real-time models or online models are based on Bayesian statistics.
- Multiclass prediction : As previously stated, Naive Bayes works well when there are more than two classes for the output variable.
- Text classification : Text classification also includes sub-applications like spam filtering and sentiment analysis. Since Naive Bayes works best with discrete variables, it tends to work well in these applications.
- Recommendation systems : Naive Bayes is commonly used alongside other algorithms like Collaborative Filtering to build recommendations systems like Netflix’s recommended for you section, or Amazon’s recommended products, or Spotify’s recommended songs.
Thanks for Reading!
Terence Shin
Founder of ShinTwin | Let’s connect on LinkedIn | Project Portfolio is here .
以上所述就是小编给大家介绍的《A Mathematical Explanation of Naive Bayes in 5 Minutes》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
海量运维、运营规划之道
唐文 / 电子工业出版社 / 2014-1-1 / 59.00
《海量运维、运营规划之道》作者具有腾讯、百度等中国一线互联网公司多年从业经历,书中依托工作实践,以互联网海量产品质量、效率、成本为核心,从规划、速度、监控、告警、安全、管理、流程、预案、考核、设备、带宽等方面,结合大量案例与读者分享了作者对互联网海量运维、运营规划的体会。 《海量运维、运营规划之道》全面介绍大型互联网公司运维工作所涉及的各个方面,是每个互联网运维工程师、架构师、管理人员不可或......一起来看看 《海量运维、运营规划之道》 这本书的介绍吧!