What the f?

栏目: IT技术 · 发布时间: 4年前

What the f?

Source

What the f?

A good hard look at the ‘f’ word of Machine Learning and why it can’t be ignored!

I know you are eager to find out what this ‘f’ word actually is. Stay with me, we will get to it very soon. One thing I can tell you right away is that regardless of your familiarity with Machine Learning, understanding this ‘f’ word will help you understand what most of Machine Learning is all about.

B efore that, let’s indulge in a little bit of role play. You are a Data Scientist and your startup has tasked you to work with a marketing colleague, to improve the sales of your company’s product. You have to advise the “marketing guy” on how to adjust the advertisement budget on three different media outlets — TV, Radio and Newspaper.

What the f?

Source

You take a look at the past data (Fig. 1) and you can tell with the naked eye that, clearly, how much money you put into advertising on each media outlet — TV, Radio and Newspaper — has an impact on the product’s sales.

What the f?

Fig. 1: Sales (in thousands of units) vs budget for each ad medium (in thousands of dollars)

As a Data Scientist, you would like to understand and explain how these 3 operate together to influence sales. In other words, we would like to model the sales as a function of TV, Radio and Newspaper budget. Yes, that’s our elusive ‘f’ word — function.

‘Mathy’ way of saying “sales as a function of TV, Radio and Newspaper budgets”.

What does this ‘f’ mean?

Simply put, you can think of f as something that takes an input X and produces an output Y. A good analogous example would be a washing machine. You put in dirty clothes (X) into the washing machine (f) and it gives you back washed clothes (Y).

What the f?

Source

In the context of product sales and advertisement media budget, the function f will take TV, Radio and Newspaper budgets, represented by X1, X2, X3 respectively, as input and return sales Y as output. (We represent X1, X2 and X3 in a combined form — as a vector X)

What the f?

Spoiler Alert! Much of Machine Learning is actually just coming up with a good f that can take some input data and return a reliable output.

Why do we want this f?

What the f?

Source

There are 3 main reasons why we want to find a good f:

  • With a good f we can input budgets for all 3 media and predict how much the sales would be.
  • We can understand which predictors i.e. TV, Radio, Newspaper budgets, are important in influencing Y. We might find out that spending money on Newspapers is actually a waste because Newspaper ads do not boost sales by much.
  • We might be able to understand how each predictor influences Y. For example, we might find that investing in TV ads is 5x more effective than investing in Newspaper ads.

Enough teasing.….how do I find this f?

Before we can answer that question, we need to ask ourselves the following question:

Is there some perfect f out there in the wide, gorgeous Universe?

What the f?

Source

Well, maybe not a “perfect” f, but there is an ideal/optimal f. If we take a look at Fig. 2, we notice something curious — for one point on the X-axis (Newspaper budget), there seem to be multiple corresponding Y (sales) values in some cases. For example for the data plotted in Fig. 2, for x = 6.4, there are two corresponding values on the Y-axis: y =11.9 and y = 17.3.

What the f?

Fig. 2: Sales vs Newspaper Budget

So an ideal function can simply be the average of all y values corresponding to a particular x. In other words, for the figure above:

What the f?

In more ‘mathy’ terms, this average value of all Ys at any X is called the expected value, E(Y). Thus, this procedure of taking the average of all Y values at any X can be our ‘ideal’ function. Our ideal f can be expressed in the following way:

What the f?
(Don’t worry about the Y|X….its just a ‘mathy’ way of saying “ Y given that X is equal to some specific value x”)

Okay….but then why do we need Machine Learning?

Sadly, because we live in the “real world”.

What the f?

In the “real world”, we do not have all the data that we need to reliably estimate Y using the averaging idea we discussed above. Even for the sales-advertisement data, you can see that in Fig. 2, there is no corresponding Y value for x=77.5, x=95, x=110 etc.

One neat solution for this problem of missing data is to use the idea of a neighbourhood.

What the f?

Source

What it means is that instead of taking the average of Y values strictly at x=77.5, we can take the average of all values of Y that occur at points neighbouring x=77.5. So, maybe a possible neighbourhood could be something like, from x=75 to x=80 (refer to the blue vertical lines in Fig. 3).

What the f?

Fig. 3: For f(77.5) we take the average of all Y values from 75≥x≤80

Our definition and notation change a little bit to reflect the idea that we are no longer restricted to values of Y occurring exactly at a given point X=x, but instead are now looking at Y values occurring in the neighbourhood of X=x.

What the f?

This works fine until we run into two major problems:

  • What happens when there are multiple predictors apart from just Newspaper budget (eg: TV, Radio, Facebook ads, Google ads…). In that case the problem expands into multiple dimensions (beyond just x and y axes) and it gets increasingly difficult to define our precious little ‘neighbourhood’. (This problem has a badass name: The Curse of Dimensionality )
  • What happens when there is no data in the neighbouring area? For example in Fig. 3 there is no data from x=115 to x=145 and beyond.

Machine Learning to the rescue!

To not constrain our f by the two problems mentioned above, we turn to Machine Learning to estimate this f instead. While there is a wide assortment of Machine Learning models to choose from, let’s consider a simple but effective one — a linear regression model. In a linear regression model, the inputs X1 (TV budget), X2 (Radio budget), X3 (Newspaper budget) are multiplied by w1, w2 and w3 respectively and added together to obtain Y.

What the f?

In the equation above, w0, w1, w2, w3 are parameters whose values are learnt through training and fitting the model on the data. In other words, the values of these parameters change by ‘looking’ at the data and repeatedly making guesses that get better over time till we obtain a good enough f.

Conclusion

Which model(s) to choose for estimating f, how to carry out the learning procedure and what a “good enough” f means are non-trivial questions that Machine Learning practitioners investigate iteratively when working on a particular problem. Machine Learning practitioners often rely on experience, domain knowledge and empirical evidence to try to answer these questions. Nonetheless, regardless of the context and nature of a problem, finding a good f is what underlies much of prediction, inference and problem solving using Machine Learning.

References/Inspiration

  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statistical Learning : with Applications in R . New York :Springer, 2013.
  • Hastie, Trevor, Robert Tibshirani, and J. H Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer, 2009.

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

C++沉思录

C++沉思录

Andrew Koenig、Barbara Moo / 黄晓春、孟岩(审校) / 人民邮电出版社 / 2008-1 / 55.00元

《C++沉思录》基于作者在知名技术杂志发表的技术文章、世界各地发表的演讲以及斯坦福大学的课程讲义整理、写作而成,融聚了作者10多年C++程序生涯的真知灼见。全书分为6篇32章,分别对C++语言的历史和特点、类和继承、STL与泛型编程、库的设计等几大技术话题进行了详细而深入的讨论,细微之处几乎涵盖了C++所有的设计思想和技术细节。全书通过精心挑选的实例,向读者传达先进的程序设计的方法和理念。一起来看看 《C++沉思录》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

URL 编码/解码
URL 编码/解码

URL 编码/解码