What the f?

栏目: IT技术 · 发布时间: 4年前

What the f?

Source

What the f?

A good hard look at the ‘f’ word of Machine Learning and why it can’t be ignored!

I know you are eager to find out what this ‘f’ word actually is. Stay with me, we will get to it very soon. One thing I can tell you right away is that regardless of your familiarity with Machine Learning, understanding this ‘f’ word will help you understand what most of Machine Learning is all about.

B efore that, let’s indulge in a little bit of role play. You are a Data Scientist and your startup has tasked you to work with a marketing colleague, to improve the sales of your company’s product. You have to advise the “marketing guy” on how to adjust the advertisement budget on three different media outlets — TV, Radio and Newspaper.

What the f?

Source

You take a look at the past data (Fig. 1) and you can tell with the naked eye that, clearly, how much money you put into advertising on each media outlet — TV, Radio and Newspaper — has an impact on the product’s sales.

What the f?

Fig. 1: Sales (in thousands of units) vs budget for each ad medium (in thousands of dollars)

As a Data Scientist, you would like to understand and explain how these 3 operate together to influence sales. In other words, we would like to model the sales as a function of TV, Radio and Newspaper budget. Yes, that’s our elusive ‘f’ word — function.

‘Mathy’ way of saying “sales as a function of TV, Radio and Newspaper budgets”.

What does this ‘f’ mean?

Simply put, you can think of f as something that takes an input X and produces an output Y. A good analogous example would be a washing machine. You put in dirty clothes (X) into the washing machine (f) and it gives you back washed clothes (Y).

What the f?

Source

In the context of product sales and advertisement media budget, the function f will take TV, Radio and Newspaper budgets, represented by X1, X2, X3 respectively, as input and return sales Y as output. (We represent X1, X2 and X3 in a combined form — as a vector X)

What the f?

Spoiler Alert! Much of Machine Learning is actually just coming up with a good f that can take some input data and return a reliable output.

Why do we want this f?

What the f?

Source

There are 3 main reasons why we want to find a good f:

  • With a good f we can input budgets for all 3 media and predict how much the sales would be.
  • We can understand which predictors i.e. TV, Radio, Newspaper budgets, are important in influencing Y. We might find out that spending money on Newspapers is actually a waste because Newspaper ads do not boost sales by much.
  • We might be able to understand how each predictor influences Y. For example, we might find that investing in TV ads is 5x more effective than investing in Newspaper ads.

Enough teasing.….how do I find this f?

Before we can answer that question, we need to ask ourselves the following question:

Is there some perfect f out there in the wide, gorgeous Universe?

What the f?

Source

Well, maybe not a “perfect” f, but there is an ideal/optimal f. If we take a look at Fig. 2, we notice something curious — for one point on the X-axis (Newspaper budget), there seem to be multiple corresponding Y (sales) values in some cases. For example for the data plotted in Fig. 2, for x = 6.4, there are two corresponding values on the Y-axis: y =11.9 and y = 17.3.

What the f?

Fig. 2: Sales vs Newspaper Budget

So an ideal function can simply be the average of all y values corresponding to a particular x. In other words, for the figure above:

What the f?

In more ‘mathy’ terms, this average value of all Ys at any X is called the expected value, E(Y). Thus, this procedure of taking the average of all Y values at any X can be our ‘ideal’ function. Our ideal f can be expressed in the following way:

What the f?
(Don’t worry about the Y|X….its just a ‘mathy’ way of saying “ Y given that X is equal to some specific value x”)

Okay….but then why do we need Machine Learning?

Sadly, because we live in the “real world”.

What the f?

In the “real world”, we do not have all the data that we need to reliably estimate Y using the averaging idea we discussed above. Even for the sales-advertisement data, you can see that in Fig. 2, there is no corresponding Y value for x=77.5, x=95, x=110 etc.

One neat solution for this problem of missing data is to use the idea of a neighbourhood.

What the f?

Source

What it means is that instead of taking the average of Y values strictly at x=77.5, we can take the average of all values of Y that occur at points neighbouring x=77.5. So, maybe a possible neighbourhood could be something like, from x=75 to x=80 (refer to the blue vertical lines in Fig. 3).

What the f?

Fig. 3: For f(77.5) we take the average of all Y values from 75≥x≤80

Our definition and notation change a little bit to reflect the idea that we are no longer restricted to values of Y occurring exactly at a given point X=x, but instead are now looking at Y values occurring in the neighbourhood of X=x.

What the f?

This works fine until we run into two major problems:

  • What happens when there are multiple predictors apart from just Newspaper budget (eg: TV, Radio, Facebook ads, Google ads…). In that case the problem expands into multiple dimensions (beyond just x and y axes) and it gets increasingly difficult to define our precious little ‘neighbourhood’. (This problem has a badass name: The Curse of Dimensionality )
  • What happens when there is no data in the neighbouring area? For example in Fig. 3 there is no data from x=115 to x=145 and beyond.

Machine Learning to the rescue!

To not constrain our f by the two problems mentioned above, we turn to Machine Learning to estimate this f instead. While there is a wide assortment of Machine Learning models to choose from, let’s consider a simple but effective one — a linear regression model. In a linear regression model, the inputs X1 (TV budget), X2 (Radio budget), X3 (Newspaper budget) are multiplied by w1, w2 and w3 respectively and added together to obtain Y.

What the f?

In the equation above, w0, w1, w2, w3 are parameters whose values are learnt through training and fitting the model on the data. In other words, the values of these parameters change by ‘looking’ at the data and repeatedly making guesses that get better over time till we obtain a good enough f.

Conclusion

Which model(s) to choose for estimating f, how to carry out the learning procedure and what a “good enough” f means are non-trivial questions that Machine Learning practitioners investigate iteratively when working on a particular problem. Machine Learning practitioners often rely on experience, domain knowledge and empirical evidence to try to answer these questions. Nonetheless, regardless of the context and nature of a problem, finding a good f is what underlies much of prediction, inference and problem solving using Machine Learning.

References/Inspiration

  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statistical Learning : with Applications in R . New York :Springer, 2013.
  • Hastie, Trevor, Robert Tibshirani, and J. H Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer, 2009.

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

OKR:源于英特尔和谷歌的目标管理利器

OKR:源于英特尔和谷歌的目标管理利器

(美) 保罗R.尼文(Paul R. Niven)、本•拉莫尔特(Ben Lamorte) / 况阳 / 机械工业出版社 / 2017-8-1 / 59.00元

内在动机驱动,而非绩效考核驱动 尤其适用快速扩张和转型期组织 谷歌、英特尔、领英、推特、星佳等硅谷知名企业成功的法宝 OKR(目标与关键结果法)是一套严密的思考框架和持续的纪律要求,旨在确保员工紧密协作,把精力聚焦在能促进组织成长的、可衡量的贡献上。 如何更好地将OKR集成到企业现有的绩效评估体系中? 如何确保OKR由高管团队来领导,而不仅仅是HR、IT或财务等职能部......一起来看看 《OKR:源于英特尔和谷歌的目标管理利器》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具