How a simple textual explanation can add value to your data science results

栏目: IT技术 · 发布时间: 4年前

内容简介：The popular saying “A picture is worth a thousand words” may be wrong when it comes to data science. Take example of Uber Expected Arrival Time (ETA) algorithm which informs the user when the ride is expected to arrive.Behind the ETA , there is lot of comp

How a simple textual explanation can add value to your data science results

Enhance the power of your data exploration using textual explanations

The popular saying “A picture is worth a thousand words” may be wrong when it comes to data science. Take example of Uber Expected Arrival Time (ETA) algorithm which informs the user when the ride is expected to arrive.

Behind the ETA , there is lot of complex predictive algorithm and cutting-edge visualisation with the map getting updated in real time. But all this of no use without the single text line which says “The closest driver is approximately 1 min away”

Uber Expected Time of Arrival (ETA) algorithm in action

A data scientist or data analyst produces lot of data visualisation during a data exploration phase. All the cool visualisations look great, but you can really enhance its values using short textual explanations . Also in many cases, visualisations alone are not sufficient.

Only visualisations without explanations are source of misinterpretation

Take a simple example of a histogram. Shown below is histogram of a stock price close value

Just by looking at this visualisation, one can make many interpretation such as

Interpretation 1 — The maximum occurring value is between 13 and (something…).

Interpretation 2 — The lowest value seems to be between 5 something and 10 something

Stock trading is an area where one has to be very precise in values. So if interpretation is not precise, the visualisation alone does not help.

Data story-telling is a compulsion because visualisations alone do not do the job

Since many years data story telling has become a must-have skill for data scientist. But actually speaking, it is a compulsion because visualisations alone cannot convey the story.

A very simple visualisation can have a great story behind it. But unless it is told, it never surfaces. Take the histogram visualisation which was shown above.

The real story behind the histogram is that the stock price is swinging between 11 and 15 and stays on 12 for a very short amount of time. So the buying opportunity on 12 is very short. This kind of story is impossible to capture in a visualisation and needs to be physically told. Even if advanced visualisation such as animations are used, it still requires someone physically to tell the story

So this is where the power of a explanation comes into play. Adding a short textual explanation enhances what the value of visualisation. You go from showing visualisation to convey something meaningful

Let us see now some examples where explanations enhance interpretation of visualisation

Explaining a correlation matrix and avoid the stress of a “color-maze”

A correlation matrix visually looks stunning. However due to presence of lot of different shades of color, one has to look hard to interpret it. However just by adding a few lines of textual explanation increases vastly the interpretation of correlation matrix. The text can explain which are the most correlated data, as well as what the different shades of color mean

Shown below is correlation matrix based on car data. As you can see that just by adding a small explanation clearly enhances the value of the nice-looking correlation matrix. It will save your users “eye-balling” to see which are the most correlated data

Example of text explanation of correlation matrix

Explaining a cumulative distribution to avoid “eye-balling” x and y axis

Cumulative distributions are very important to show how a numeric value is distributed. It is also creative way of focusing on important threshold of the numeric column

However just showing the cumulative distribution without any explanation is a painful eye-balling exercise. With a short explanation text about different threshold levels immediately gets the power of cumulative distribution to the next level and starts making sense

Shown below is cumulative distribution of stock price. With text explanations on thresholds (example 80% of close prices are less then 79.31) clearly enhances the value of a cumulative distribution visualisation

Example of text explanation of cumulative distribution

Explaining result of clustering to avoid any guess work

Clustering is a very powerful tool for any data exploration activity. However it can be one of the most mis-interpreted if not clearly explained. The result of clustering is generally a scatter plot with clusters shown in different colors. However the catch here is the fact that a 2D scatter plot visually shows only 2 columns of your data, where the clustering itself resulted from much more columns

So in order to correctly explain the clustering results, you need to use textual explanation which contains the feature importance of the clustering results

Example of text explanation of clustering

Including text generation functions in your developments

As data scientists, we focus on coding for all activities from data preparation, feature engineering, hyper parameter tuning, modeling, visualisation. But most of us do not focus on automatically generating textual explanations of results. So it is an good idea to make a habit to include functions which generate textual explanations inside the code

As more and more algorithms are packaged into products meant for end-users, the textual explanations of results is becoming very evident. And will make your data science work more appealing to a wider audience

以上所述就是小编给大家介绍的《How a simple textual explanation can add value to your data science results》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

How a simple textual explanation can add value to your data science results

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

再看电商

黄若 / 电子工业出版社 / 2014-7-1 / CNY 39.00

电商行业在中国经历了十年的高速增长。如果说十年前的网上购物是新鲜潮人的尝试的话，那么今天几亿网购人群的规模，零售市场18,000亿人民币的年交易额，正催生着一个改变人们生活习惯的全新行业。互联网正在从各个维度重新定义生产、品牌、娱乐、传播、消费，电商毫无疑问的在购物领域影响着越来越多人的生活。同时，这个行业连年亏损，顾客服务良莠不齐，也受到广泛关注。作者从地面零售到电子商务，从跨国公司高管到管理民......一起来看看《再看电商》这本书的介绍吧!

码农工具