7 ways to catch a Data Scientist’s lies and deception

栏目: IT技术 · 发布时间: 4年前

内容简介:7 simple principles to make sure you’re not being taken advantage of by someone selling you “AI” and “Machine Learning”Be very careful when someone says “AI”. While it is probably fanciful marketing, it could also be a sincere effort at trying to abstract

7 simple principles to make sure you’re not being taken advantage of by someone selling you “AI” and “Machine Learning”

W hether you are a business leader, entrepreneur, angel investor, part of your company’s middle management, judge at a hackathon or someone involved in ‘ tech’ , at some point you are likely to end up in a situation where someone is trying to ‘sell’ you their “AI product”, “Machine Learning software” or some other fancy fusion of buzz words. If you find yourself in such a situation, it is natural to feel like you do not have ample knowledge and expertise to make a a sound decision. Stand your ground and don’t be overwhelmed! The following are 7 common sense ways that will help you in separating the signal from the noise. These will help you in cutting through the BS and aid you in understanding the core value proposition of the Machine Learning solution that you are being sold on.

1. “We used A.I. to…”

7 ways to catch a Data Scientist’s lies and deception

Source: https://media2.giphy.com

Be very careful when someone says “AI”. While it is probably fanciful marketing, it could also be a sincere effort at trying to abstract away painfully complicated details so as to not bother you. Give them the benefit of the doubt BUT delve into the details. Find out more about which specific Machine Learning model they used. Here are a few other critical questions to ask:

  1. Which other methods (models/algorithms/techniques) did you try and how were the results compared to the chosen solution? (ask for graphical evidence if possible)
  2. Why did you choose this method over the others?
  3. Why do you think this method outperforms the others on this data?
  4. Has someone else solved a similar problem? If yes, which method did they use?

At first, you may not necessarily understand all the details of the answers to these questions but you should ask, clarify and understand as much as you can.

If you can’t explain it simply, you don’t understand it well enough. — Albert Einstein

In my experience, I have not come across a single Machine Learning concept that cannot be explained by analogy. So, ask for a high level explanation, if communicating too many technical details is a challenge. Such scrutiny will not only expand your understanding, it will also indicate how well the solution has been thought through. (It will also establish that your meeting room is a no-BS-zone :sunglasses:)

2. Survival of the Adaptable

7 ways to catch a Data Scientist’s lies and deception

Source: https://i.pinimg.com

In the 1990s and early 2000s, a spam filter in your email inbox would look for spelling errors and other simple indicators to automatically put the spam emails into the spam folder. Now, spammers have become smarter and spam emails have become increasingly difficult to detect. The Machine Learning models used by modern email providers have had to adapt and become more sophisticated in identifying spam emails correctly.

“All failure is failure to adapt, all success is successful adaptation” — Max Mckeown

One thing you must clarify is, as time progresses and input data evolves, how readily can the Machine Learning model be re-trained on new data or replaced with a more performant model . This is essential as you deserve to know if there is an ‘expiry date’ on the solution you are being sold.

3. Garbage In Garbage Out

7 ways to catch a Data Scientist’s lies and deception

Source: https://media.tenor.com

A Machine Learning model is only as good as its data. Therefore, you should ascertain the quality of the data used to train a Machine Learning model. While “quality” is difficult to define and might differ depending on the context, one simple way to find out about the quality of training data is to ask — how similar and representative is the training data compared to the ‘real world’ data that the model will be facing.

“In God we trust, all others bring (good quality) data.” — W Edwards Deming

No matter how fancy or cutting-edge a Machine Learning model might be, if the data on which it is trained is of poor quality, the results are bound to be lousy.

4. More, more, more!

7 ways to catch a Data Scientist’s lies and deception

Source: https://media1.tenor.com/

In general, the more data a model has been trained on, the better it performs (ceteris paribus). This is especially true for Deep Learning models. You can think of a Machine Learning model as a high school student practicing questions for SATs. Practicing a larger number and variety of questions will increase the likelihood of the student performing better on the SATs.

“It is a capital mistake to theorize before one has (ample) data.” — Sherlock Holmes

It is essential to ensure that ample data has been used in training any Machine Learning model.How much data is enough? It is difficult to say how much data is needed, but the more the better! Ideally, the data should come from reliable sources and these sources should be used exhaustively.

5. Interpretability

7 ways to catch a Data Scientist’s lies and deception

Source: https://lh3.googleusercontent.com

In Machine Learning, there is often a trade-off between how well a model performs and how easily its performance, especially poor performance, can be explained. Generally, for complex data, more sophisticated and complicated models tend to do better. However, because these models are more complicated, it becomes difficult to explain the effect of input data on the output result. For example, let us imagine that you are using a very complex Machine Learning model to predict the sales of a product. The inputs to this model are the amounts of money spent on advertising on TV, newspaper and radio. The complicated model may give you very accurate sales predictions but may not be able to tell you which of the 3 advertisement outlets, TV, radio or newspaper, impacts the sales more and is more worth the money. A simpler model, on the other hand, might have given a less accurate result, but would have been able to explain which outlet is more worth the money. You need to be aware of this trade-off between model performance and interpretability. This is crucial because where the balance should lie on the scale of explainability vs performance, should depend on the objective and hence, should be your decision to make.

6. Measuring the Right Thing in the Right Way

7 ways to catch a Data Scientist’s lies and deception

Source: https://media2.giphy.com

Accuracy is a very common metric for measuring the performance of a classification Machine Learning model. For example, a Machine Learning model for classifying pictures of cats and dogs, with an accuracy of 96% could be considered very good. It means that out of a 100 pictures of cats and dogs, the model is able to guess 96 pictures correctly. Now imagine a bank tries to apply the same metric to classifying fraudulent transactions. The fraud classifier might easily have an accuracy of 96% because fraudulent transactions are very rare. However, catching fraudulent transactions is not really about being right 96% of the time. It is about being less wrong and being able to catch as many of the fraudulent transactions as possible, because wrongly classifying 4% of the transactions as not fraudulent could do a whole lot of damage.

Measurement is fabulous. Unless you’re busy measuring what’s easy to measure as opposed to what’s important. — Seth Godin

For the bank-fraud example, the number of false negatives is more indicative of the performance of the model than accuracy. Some other metrics, such as precision, recall, specificity and F1 score, should be used instead of accuracy depending on the problem. Here is an awesome article by Mohammed Sunasra that talks about when each of these should be used. Thus, it is critical to be mindful of using the right metric and a variety of metrics if possible.

7. So…what are your strengths and weaknesses?

7 ways to catch a Data Scientist’s lies and deception

Source: https://i2.wp.com

A cliche in the world of corporate interviewing, the strengths-weaknesses question can come in very handy when trying to evaluate a Machine Learning solution. When someone proposes a Machine Learning solution, you should definitely ask them about the limitations of their solution . It is essential to know the limitations to answer two key questions:

  1. Do the strengths outweigh the limitations enough to implement the solution?
  2. Could the limitations hamper performance in the future?
“The key to success is understanding one’s weaknesses and successfully compensating for them. People who lack that ability fail chronically.” — Ray Dalio

From the standpoint of implementing an effective and sustainable Machine Learning solution, knowing its limitations is critical to its success. Moreover, asking the proponents to come clean on the limitations of their solution will give you an idea of the level of transparency that they have. It will indicate how well the solution has been thought through and how trustworthy the people proposing the solution are.

Conclusion

Regardless of how lacking in knowledge and overwhelmed you might feel, you have one secret weapon that can help you — a flashlight to guide you through the fog. That secret weapon is your ability to ask questions. Ask questions! Question, clarify and scrutinise everything you are not sure about. These 7 aforementioned ideas will give you a holistic strategy and 7 critical dimensions along which to ask questions. You can count on them to enhance your understanding and soundly evaluate a Machine Learning solution.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

人人时代(经典版)

人人时代(经典版)

[美] 克莱•舍基(Clay Shirky) / 胡泳、沈满琳 / 浙江人民出版社 / 2015-6 / 54.90元

[内容简介] 一而再,再而三出现的公众事件,绝不仅是来自草根的随兴狂欢,而是在昭示着一种变革未来的力量之崛起!基于爱、正义、共同的喜好和经历,人和人可以超越传统社会的种种限制,灵活而有效地采用即时通信、移动电话、网络日志和维基百科等新的社会性工具联结起来,一起分享、合作乃至展开集体行动。人人时代已经到来。 微软、诺基亚、宝洁、BBC、乐高、美国海军最推崇的咨询顾问,“互联网革命最伟大的......一起来看看 《人人时代(经典版)》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

SHA 加密
SHA 加密

SHA 加密工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试