10 Concepts Every Data Scientist Should Know

栏目: IT技术 · 发布时间: 4年前

内容简介:Data science is such a broad field. If it was a recipe, the main ingredients would be linear algebra, statistics, software, analytical skills, storytelling and all seasoned with some domain knowledge. The amount of ingredients change according to the tasks

The concepts that are likely to be encountered at an interview.

10 Concepts Every Data Scientist Should Know

Photo by Tyler Casey on Unsplash

Data science is such a broad field. If it was a recipe, the main ingredients would be linear algebra, statistics, software, analytical skills, storytelling and all seasoned with some domain knowledge. The amount of ingredients change according to the tasks you are working on.

Whatever you do as a data scientist, there are some terms and concepts you should definitely be familiar with. In this post, I will cover 10 of these concepts. Please note that this post is by no means aimed to be a comprehensive list of the topics you need to know. However, knowing the following concepts will absolutely add value to your skillset and help you in your journey to learn more.

Let’s start.

1. Central Limit Theorem

We first need to introduce the normal (gaussian) distribution for central limit theorem to make sense. Normal distribution is a probability distribution that looks like a bell:

10 Concepts Every Data Scientist Should Know

X-axis represents the values and y-axis represents the probabilities of observing these values. Normal distribution is used to represent random variables with unknown distributions. Thus, it is widely used in many fields including natural and social sciences. The reason to justify why it can used to represent random variables with unknown distributions is the central limit theorem (CLT) .

According to the CLT , as we take more samples from a distribution, the sample averages will tend towards a normal distribution regardless of the population distribution.

Consider a case that we need to learn the distribution of the heights of all 20-year-old people in a country. It is almost impossible and, of course not practical, to collect this data. So, we take samples of 20-year-old people across the country and calculate the average height of the people in samples. According to the CLT, as we take more samples from the population, sampling distribution will get close to a normal distribution.

Why is it so important to have a normal distribution? Normal distribution is described in terms of mean and standard deviation which can easily be calculated. And, if we know the mean and standard deviation of a normal distribution, we can compute pretty much everything about it.


很遗憾的说,推酷将在这个月底关闭。人生海海,几度秋凉,感谢那些有你的时光。


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Android 源码设计模式解析与实战

Android 源码设计模式解析与实战

何红辉、关爱民 / 人民邮电出版社 / 2015-11 / 79.00元

本书专门介绍Android源代码的设计模式,共26章,主要讲解面向对象的六大原则、主流的设计模式以及MVC和MVP模式。主要内容为:优化代码的首步、开闭原则、里氏替换原则、依赖倒置原则、接口隔离原则、迪米特原则、单例模式、Builder模式、原型模式、工厂方法模式、抽象工厂模式、策略模式、状态模式、责任链模式、解释器模式、命令模式、观察者模式、备忘录模式、迭代器模式、模板方法模式、访问者模式、中介......一起来看看 《Android 源码设计模式解析与实战》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

URL 编码/解码
URL 编码/解码

URL 编码/解码

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具