内容简介:The Rectified Linear Unit, also known as the ReLU function, is known to be a better activation function than the sigmoid function and the tanh function because it performs gradient descent faster. Notice in the image to the left that when x (or z) is very
Netflix
Q: Why is Rectified Linear Unit a good activation function?
The Rectified Linear Unit, also known as the ReLU function, is known to be a better activation function than the sigmoid function and the tanh function because it performs gradient descent faster. Notice in the image to the left that when x (or z) is very large, the slope is very small, which slows gradient descent significantly. This, however, is not the case for the ReLU function.
Q: What is the use of regularization? What are the differences between L1 and L2 regularization?
Both L1 and L2 regularization are methods used to reduce the overfitting of training data. Least Squares minimizes the sum of the squared residuals, which can result in low bias but high variance.
L2 Regularization, also called ridge regression , minimizes the sum of the squared residuals plus lambda times the slope squared. This additional term is called the Ridge Regression Penalty. This increases the bias of the model, making the fit worse on the training data, but also decreases the variance.
If you take the ridge regression penalty and replace it with the absolute value of the slope, then you get Lasso regression or L1 regularization .
L2 is less robust but has a stable solution and always one solution. L1 is more robust but has an unstable solution and can possibly have multiple solutions.
Q: What is the difference between online and batch learning?
Batch learning, also known as offline learning, is when you learn over groups of patterns. This is the type of learning that most people are familiar with, where you source a dataset and build a model on the whole dataset at once.
Online learning, on the other hand, is an approach that ingests data one observation at a time. Online learning is data-efficient because the data is no longer required once it is consumed, which technically means that you don’t have to store your data.
Q: How would you handle NULLs when querying a data set? Are there any other ways?
There are a number of ways to handle null values including the following:
- You can omit rows with null values altogether
- You can replace null values with measures of central tendency (mean, median, mode) or replace it with a new category (eg. ‘None’)
- You can predict the null values based on other variables. For example, if a row has a null value for weight, but it has a value for height, you can replace the null value with the average weight for that given height.
- Lastly, you can leave the null values if you are using a machine learning model that automatically deals with null values.
Q: How do you prevent overfitting and complexity of a model?
For those who don’t know, overfitting is a modeling error when a function fits the data too closely, resulting in high levels of error when new data is introduced to the model.
There are a number of ways that you can prevent overfitting of a model:
- Cross-validation : Cross-validation is a technique used to assess how well a model performs on a new independent dataset. The simplest example of cross-validation is when you split your data into two groups: training data and testing data, where you use the training data to build the model and the testing data to test the model.
- Regularization : Overfitting occurs when models have higher degree polynomials. Thus, regularization reduces overfitting by penalizing higher degree polynomials.
- Reduce the number of features : You can also reduce overfitting by simply reducing the number of input features. You can do this by manually removing features, or you can use a technique, called Principal Component Analysis, which projects higher dimensional data (eg. 3 dimensions) to a smaller space (eg. 2 dimensions).
- Ensemble Learning Techniques : Ensemble techniques take many weak learners and converts them into a strong learner through bagging and boosting. Through bagging and boosting, these techniques tend to overfit less than their alternative counterparts.
Q: How would you design an experiment for a new feature we’re thinking about. What metrics would matter?
I would conduct an A/B test to determine if the introduction of a new feature results in a statistically significant improvement in a given metric that we care about. The metric(s) chosen depends on the goal of the feature. For example, a feature may be introduced to increase conversion rates, or web traffic, or retention rates.
First I would formulate my null hypothesis (feature X will not improve metric A) and my alternative hypothesis (feature X will improve metric A).
Next, I would create my control and test group through random sampling. Because the t-test inherently considers the sample size, I’m not going to specify a necessary sample size, although the larger the better.
Once I collect my data, depending on the characteristics of my data, I’d then conduct a t-test, Welch’s t-test, chi-squared test, or a Bayesian A/B test to determine whether the differences between my control and test group are statistically significant.
Q: A box has 12 red cards and 12 black cards. Another box has 24 red cards and 24 black cards. You want to draw two cards at random from one of the two boxes, one card at a time. Which box has a higher probability of getting cards of the same color and why?
The box with 24 red cards and 24 black cards has a higher probability of getting two cards of the same color. Let’s walk through each step.
Let’s say the first card you draw from each deck is a red Ace.
This means that in the deck with 12 reds and 12 blacks, there’s now 11 reds and 12 blacks. Therefore your odds of drawing another red are equal to 11/(11+12) or 11/23.
In the deck with 24 reds and 24 blacks, there would then be 23 reds and 24 blacks. Therefore your odds of drawing another red are equal to 23/(23+24) or 23/47.
Since 23/47 > 11/23, the second deck with more cards has a higher probability of getting the same two cards.
Q: You are at a Casino and have two dices to play with. You win $10 every time you roll a 5. If you play till you win and then stop, what is the expected payout?
- Let’s assume that it costs $5 every time you want to play.
- There are 36 possible combinations with two dice.
- Of the 36 combinations, there are 4 combinations that result in rolling a five ( see blue ). This means that there is a 4/36 or 1/9 chance of rolling a 5.
- A 1/9 chance of winning means you’ll lose eight times and win once (theoretically).
- Therefore, your expected payout is equal to $10.00 * 1 — $5.00 * 9= -$35.00.
Edit: Thank you guys for commenting and pointing out that it should be -$35!
Q: How can you tell if a given coin is biased?
This isn’t a trick question. The answer is simply to perform a hypothesis test:
- The null hypothesis is that the coin is not biased and the probability of flipping heads should equal 50% (p=0.5). The alternative hypothesis is that the coin is biased and p != 0.5.
- Flip the coin 500 times.
- Calculate Z-score (if the sample is less than 30, you would calculate the t-statistics).
- Compare against alpha (two-tailed test so 0.05/2 = 0.025).
-
If p-value > alpha, the null is not rejected and the coin is not biased.
If p-value < alpha, the null is rejected and the coin is biased.
Learn more about hypothesis testing here .
Q: Make an unfair coin fair
Since a coin flip is a binary outcome, you can make an unfair coin fair by flipping it twice. If you flip it twice, there are two outcomes that you can bet on: heads followed by tails or tails followed by heads.
P(heads) * P(tails) = P(tails) * P(heads)
This makes sense since each coin toss is an independent event. This means that if you get heads → heads or tails → tails, you would need to reflip the coin.
Q: You are about to get on a plane to London, you want to know whether you have to bring an umbrella or not. You call three of your random friends and ask each one of them if it’s raining. The probability that your friend is telling the truth is 2/3 and the probability that they are playing a prank on you by lying is 1/3. If all 3 of them tell that it is raining, then what is the probability that it is actually raining in London.
You can tell that this question is related to Bayesian theory because of the last statement which essentially follows the structure, “What is the probability A is true given B is true?” Therefore we need to know the probability of it raining in London on a given day. Let’s assume it’s 25%.
P(A) = probability of it raining = 25%
P(B) = probability of all 3 friends say that it’s raining
P(A|B) probability that it’s raining given they’re telling that it is raining
P(B|A) probability that all 3 friends say that it’s raining given it’s raining = (2/3)³ = 8/27
Step 1: Solve for P(B)
P(A|B) = P(B|A) * P(A) / P(B), can be rewritten as
P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)
P(B) = (2/3)³ * 0.25 + (1/3)³ * 0.75 = 0.25*8/27 + 0.75*1/27
Step 2: Solve for P(A|B)
P(A|B) = 0.25 * (8/27) / ( 0.25*8/27 + 0.75*1/27)
P(A|B) = 8 / (8 + 3) = 8/11
Therefore, if all three friends say that it’s raining, then there’s an 8/11 chance that it’s actually raining.
Q: You are given 40 cards with four different colors- 10 Green cards, 10 Red Cards, 10 Blue cards, and 10 Yellow cards. The cards of each color are numbered from one to ten. Two cards are picked at random. Find out the probability that the cards picked are not of the same number and same color.
Since these events are not independent, we can use the rule:
P(A and B) = P(A) * P(B|A) ,which is also equal to
P(not A and not B) = P(not A) * P(not B | not A)
For example:
P(not 4 and not yellow) = P(not 4) * P(not yellow | not 4)
P(not 4 and not yellow) = (36/39) * (27/36)
P(not 4 and not yellow) = 0.692
Therefore, the probability that the cards picked are not the same number and the same color is 69.2%.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Big Java Late Objects
Horstmann, Cay S. / 2012-2 / 896.00元
The introductory programming course is difficult. Many students fail to succeed or have trouble in the course because they don't understand the material and do not practice programming sufficiently. ......一起来看看 《Big Java Late Objects》 这本书的介绍吧!
HTML 压缩/解压工具
在线压缩/解压 HTML 代码
正则表达式在线测试
正则表达式在线测试