Probability Must-Haves

栏目: IT技术 · 发布时间: 4年前

内容简介:Let's say that you have a random likelihood that a user will click on a call-to-action(I’ll call it a CTA from here on out, but this is anytime you invite the reader to buy, shop, give an email, etc.) within your application. Once they have clicked the cal

Probability Must-Haves

by annca at pixabay.com

Understanding Random Events

Customers and Your Application

Let's say that you have a random likelihood that a user will click on a call-to-action(I’ll call it a CTA from here on out, but this is anytime you invite the reader to buy, shop, give an email, etc.) within your application. Once they have clicked the call-to-action there is a subsequent likelihood that they will press the submit button to send along their email information (in this case).

Now let’s assign some probabilities… To click the CTA: 50% likelihood and the submit button: 10% likelihood.

Now the question we have is that before either of these events takes place, what is the likelihood that BOTH will occur?

Revisiting the Classic Probability Example: Dice

Now you may have seen this before, so at the very least this serves as an excellent refresher.

To start out, let’s answer the following question: what is the probability of rolling two subsequent 6's.

For your first roll, you can roll the die each of the following ways: 1–2–3–4–5–6. In total, you have six options.

Obvious? yes yes, but now here’s where we think a bit more critically about this. If you roll a 1, what is your next option? Well, it’s any of the following numbers: 1–2–3–4–5–6. If you had rolled a 2, you’d have the same 6 total options for the second roll. In this scenario, you would have 6 options for your second throw. In total, this gives us 36 (6*6) different outcomes.

So back to the question… what is the likelihood that you might roll two 6’s in a row?

Well if we roll a die, the likelihood of the first landing on 6 is 1/6. Once that has happened, we have a likelihood of 1/6 again. In all of the 36 different combinations of two rolls, there is only one in which a 6 is rolled twice in a row. The math for this is to simply multiply the two probabilities together giving you a 1/36 likelihood that you would roll two 6s in a row.

Back to the App Example

Let’s apply what we just learned to our original example. If there is a 1/2 chance of a user hitting the CTA and a 1/10 chance of them hitting the submit button; then we can multiply our probabilities together to say that there is a 1/20 or 5% chance that they will hit both buttons.

Theory Aside, Let's Write Some Code

To preface here; if you haven’t seen my medium post on statistical inference I detail the rbinom function: https://towardsdatascience.com/an-introduction-to-binomials-inference-56394956e1a4 . To give you the quick version of the significance of this, you use this function to simulate randomly occurring binomial events.

Let’s Require Both Using…. And

We have 5000 draws of the event, with their corresponding probabilities. We will see 5000 draws of either 1 or 0. When we specify the & , we are effectively saying that both must be true. When we take an average of the occurrences of both being true gives us the probability that both would occur.

CTA <- rbinom(5000, 1, .5) 
SUBMIT <- rbinom(5000, 1, .1) 
mean(CTA & SUBMIT)

Probability Must-Haves

We can repeat this process with however many steps that we want. Above we can see that 5% of the time both actions occurred together.

From And to Or

So let's go back to the dice example. Let's say that rather than the likelihood that both would be 6’s, we want to calculate the probability that either of the two rolls would be a 6.

Conceptual Approach

The way to think about this is to start with their independent likelihood. Either one has a 1/6 chance of being rolled. Let’s take those two 1/6 likelihoods and add them together. Nearly there… but there’s one issue… Implicit in that 1/6 + 1/6 is also the group of occurrences where both would occur- both being the keyword there. Those occurrences will have to come out!

This is where we bring together what we have learned so far.

We will add the probabilities together, but then we will subtract out the probabilities that indicate both. That being 1/6*1/6.

Mathematical Approach

This gives us a formula of 1/6 + 1/6–1/6*1/6

Programmatic Approach

Let’s replicate this like we did earlier.

roll_1 <- rbinom(5000, 1, .17) 
roll_2 <- rbinom(5000, 1, .17)round(mean(roll_1 | roll_2),2) .17 + .17 - .17*.17
Probability Must-Haves

Above you’ll see something very similar to what we had the first time- the only difference being the OR or | operator. We can see that when we generated many random occurrences we had a 31% occurrence rate in the randomly generated dataset. We were able to replicate that number by performing that simple equation we just came up with validating that the two approaches are consistent.

From Or to Conditionals

Let’s say you are going to simulate the occurrence of five individuals coming to your site and either clicking on the CTA or not with a 50% likelihood of either outcome. Similar to what we did before, we are going to simulate the groups of 5 a full 50,000 times.

CTA <- rbinom(50000, 5, .5) 
mean(CTA)

You can see the number of times the CTA was clicked in each of the trials or simulations. When we take the mean of the CTAs, we see an average of 2.5 across each trial, which represents 50% of the 5 in each trial.

Now that we’ve come this far, let’s say you want to understand the probability that at least 2 of the 5 users will click the CTA, then let’s do the same thing for 3 of the 5, then 4 of the 5.

mean(CTA >= 2) 
mean(CTA >= 3) 
mean(CTA >= 4)
Probability Must-Haves

As you plug in this conditional statement that evaluates to TRUE or FALSE. Our mean function treats TRUE as 1 and FALSE as 0 allowing us to take the average occurrences where the statement was true.

We can now leverage this idea with OR and AND as well.

Conclusion

We visited a lot of ideas in a very short time, I hope this breakdown of probability foundations was helpful! Be sure to check out my blog at datasciencelessons.com to learn more!

As always, happy data science-ing!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Usability for the Web

Usability for the Web

Tom Brinck、Darren Gergle、Scott D. Wood / Morgan Kaufmann / 2001-10-15 / USD 65.95

Every stage in the design of a new web site is an opportunity to meet or miss deadlines and budgetary goals. Every stage is an opportunity to boost or undercut the site's usability. Thi......一起来看看 《Usability for the Web》 这本书的介绍吧!

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具