内容简介:The Negative Binomial distribution is a discrete probability distribution that you should have in your toolkit for count data. For example, you might have data on the number of pages someone visited before making a purchase or the number of complaints or e
Up your Statistics Game
Aug 2 ·8min read
The Negative Binomial distribution is a discrete probability distribution that you should have in your toolkit for count data. For example, you might have data on the number of pages someone visited before making a purchase or the number of complaints or escalations associated with each customer service representative. Given this data, you might want to model the process and, later, see if some covariates affect the parameters. And in many contexts, you might find that a negative binomial distribution is a good fit.
In this article we’ll introduce the distribution and compute its probability mass function (PMF). We’ll cover its basic properties (mean and variance) by using the binomial theorem. This is in contrast to the usual treatments you will find which either just give you a formula or use more complicated tools to derive the results. Finally, we’ll turn to focus on the distributions’ interpretations.
The Negative Binomial Distribution
Suppose you are going to flip a biased coin that has probability p of coming up heads, which we will call a “success.” Furthermore, you are going to flip the coin continuously until at r successes occur. Let k be the number of failures along the way (so k+r coin flips happen in total).
In the context of our examples, we could imagine:
- A user might browse your website. On each page they have a probability of p= 1% of seeing an item they want to buy. We imagine that when they have put r =3 items in their basket, they are ready to checkout. k is the number of pages they will browse and not buy from. Of course we will want to fit the model to find the true values of r and p as well as if/how they vary between users.
- A customer service representative might in general receive complaints. After receiving complaints, there is a probability p that they will be reprimanded. Then after r times being told off, they will stop getting complaints due to changed behavior. k is the number of complaints on which they are not reprimanded before they change their behavior.
Whether you actually think this is true is, as always, up to your prior beliefs and how well the model fits the data. Also, note that the number of failures is closely related to the number of events (k versus k plus r).
It is relatively straightforward to write down the probability mass function using some combinatorics. The probability that the r -th success happens on the (k+r)-th coin flip is:
- The probability that there are r–1 successes on the first k+r–1 flips, times
- The probability of success on the ( k+r)- th flip.
There are (k+r–1) choose k orderings of (r–1) successes and k failure on the first k+r–1 flips. (The number of ways to arrange k A’s and (r–1) B’s in a line). Each has the same probability of occurring. This gives the PMF:
Hopefully you remember some basic facts about combinations and permutations. If not, here is a brief review of facts you can convince yourself of to help you out. Suppose there are 3 A’s and 2 B’s and you want to arrange them into a string like “AAABB” or “ABABA”. The number of ways to do this is 5 choose 2 (there are 5 total things and 2 B’s) which is the same as 5 choose 3 (there are 3 A’s). To see this, pretend that each letter is actually a distinct symbols (so the 5 symbols are A1, A2, A3, B1, B2). Then there are 5!=120 ways to arrange the distinct symbols. But there are 3!=6 ways to rearrange the A1 A2 A3 without changing the placements of the A’s, and 2!=2 ways to arrange the B’s. So the total number is 5!/2!3! = 10.
Now, the trick is, binomials also work for negative numbers on top, or with non-integers. For example, if we expand what we have above, we can add a minus sign to each of the k terms in the numerator:
Hence the name “negative binomial.”
The other trick to keep in mind is that we can define binomials with non-integer numbers. Using the fact that the Γ function ( Gamma function ) satisfies, for positive integers n ,
We can write our binomial coefficients in the form
And this enables us to allow that, in the negative binomial distribution, the parameter r does not have to be an integer. This will be useful because when we estimate our models, we generally don’t have a way to constrain r to be an integer. So a non-integer value for r won’t be a problem. (We will require r to be positive, however). We’ll come back to how to interpret a non-integer value of r .
Properties of the Negative Binomial Distribution
We would like to compute the expectation and variance. As a warmup, let’s check that the negative binomial distribution is in fact a probability distribution. For convenience, let q=1–p .
The crucial point is the third line, where we used the binomial theorem (yes, it works with negative exponents).
Now let’s compute the expectation:
To get the third line, we used the identity
Where we used the binomial theorem again to get the third to last line.
Warning: this is the opposite of what you will find on Wikipedia as of this writing. It is what you will find from Wolfram (the makers of Mathematica). This is because Wikipedia thinks about the number of successes before r failures, where as we count failures before r successes. In general, there is a variety of similar ways to parameterize/interpret the distribution, so be careful you have everything straight when looking at formulas in different places.
Next, we can compute the variance in two steps. First, we repeat the trick from above, using the identity twice this time to get the third line. We again use the binomial theorem to compute the sum and obtain the third-to-last line.
Now we can compute:
Again, this is the opposite of what is on Wikipedia.
Interpretation of the Negative Binomial Distribution
We have covered the “defining interpretation” of the Negative Binomial Distribution: it is the number of failures before r success occur, with the probability of success at each step being p . But there are a few other ways to look at the distribution that can be illuminating and also help interpret the case where r is not an integer.
Over-Dispersed Poisson Distribution
The Poisson distribution is a very simple model for count data, which assumes that events happen randomly at a certain rate. Then it models the distribution of how many events will occur in a given time interval. In the context of our examples, it would say that:
- Customer service representatives get complaints at a constant rate. The variation in counts is just determined by random variation. (Compare the model where their behavior eventually changes). Again, in modeling this, we could model a difference in rate between representatives based on exogenous covariates.
One big problem with the Poisson distribution is that the variance is equal to the mean. This may not fit our data. Let’s say we parameterize our Negative Binomial distribution with a mean λ and stopping parameter r . Then we have
Our probability mass function becomes
Now let’s consider what happens if we take the limit as r →∞ holding λ fixed. (This means that the probability of success goes to 1 as well, in the way defined by p=r/[λ+r]). In this limit, the binomial term approaches (–r) to the power of k divided by k! and r + λ approaches r.
In the last line, the r to the k-th powers cancel and we have used the definition of the exponential. The result is that we recover the Poission distribution.
Therefore, we can interpret the Negative Binomial Distribution as a generalization of the Poisson distribution. If the distribution is in fact Poission, we will see a large r and p close to 1. This makes sense because as p approaches 1, the variance approaches the mean. When p is smaller than one, the variance is higher than that of a Poisson distribution with the same mean, so we can see that the Negative Binomial distribution generalizes Poisson by increasing the variance.
Mixture of Poisson Distributions
The Negative Binomial Distribution also arises as a mixture of Poisson random variables. For example, suppose that our customer service representatives each receive complaints at a given rate (they never change their behavior), but that rate varies between representatives. If that rate is randomly distributed according to a Gamma distribution , we get a Negative Binomial Distribution for the ensemble.
The intuition behind this is as follows. We initially said the Negative Binomial Distribution was the count of failures before r successes when we do coin flips. Instead, replace the coin flip with two Poisson processes. Process one (the “success” process) has rate p and process two, the “failure” process, has rate (1-p). This means that instead of thinking of the Negative Binomial Distribution as counting coin flips, we think that there are independent processes generating “success” and “failure” independently and we just count how many failures before a certain number of successes.
Now, the Gamma Distribution is the distribution of waiting times for Poisson processes. Let T be the waiting time for r successes from the “success” process. T is Gamma distributed. Then the number of failures has a mean of (1–p)T and is Poisson distributed.
Conclusion
The last few points worth pointing out. First of all, there is no analytic way to fit the Negative Binomial Distribution to data. Instead, use the Maximum Likelihood Estimator and numerical estimation. You can use the statsmodels
package to do this in Python.
Also, it is possible to do Negative Binomial regression, modeling the effects of covariates. We’ll save that for a future article.
以上所述就是小编给大家介绍的《Use a Negative Binomial for Count Data》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Measure What Matters
John Doerr / Portfolio / 2018-4-24 / GBP 19.67
In the fall of 1999, John Doerr met with the founders of a start-up he’d just given $11.8 million, the biggest investment of his career. Larry Page and Sergey Brin had amazing technology, entrepreneur......一起来看看 《Measure What Matters》 这本书的介绍吧!