内容简介:Classification is a very important area of machine learning, as it allows you to create categories based on certain characteristics. It is used in a lot of fields nowadays such as marketing, where we can classify visitors of a sales site according to their
Exploring and analysing of the Iris Dataset
May 1 ·8min read
Classification is a very important area of machine learning, as it allows you to create categories based on certain characteristics. It is used in a lot of fields nowadays such as marketing, where we can classify visitors of a sales site according to their appetite to buy. We were just talking about it in a previous article .
In this one we will take a set of flowers and their characteristics. We are going to analyze the different characteristics of these flowers and try to see if they are correlated. This will help us to create a classification model that each time we give the characteristics of a flower we can tell which species it is.
For this analysis we will use a dataset that comes from Kaggle a very famous dataset bank. This one indexes 150 flowers to which one associates a species and some characteristics.
Features Presentation
Let’s see what we can find in this dataset just by using the summary function.
We can see here that we have three different species: Setosa, Versicolor and Virginica with 50 records for each species. We can also see that petal length feature has a very high range compared to the others. If we go deeper, we can say that the high difference between the first quartile and the median could be caused by a species that has some very little petals. Enough speculation! Let’s dive into this dataset together.
Finding a discriminatory feature
It will be interesting to study which characteristic(s) discriminates each species and to what extent.
To do this we will start by analyzing each species in relation to the others, in order to determine if one of the characteristics is not in the average. If so, we will analyse this characteristic more precisely.
Setosa
We’ll begin by analysing if one of the features of the Setosa is far from the mean.
We begin by saving the mean of the features concerning the Setosa species in one variable and another with all flowers that are not Setosa. Then, we can plot it.
We can immediately see that the petal length feature seems to discriminate against the Setosa. Now we’ll only analyse this feature.
Let’s compare the Petal length distribution of Setosa against the all flowers.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
白帽子讲Web安全(纪念版)
吴翰清 / 电子工业出版社 / 2014-6 / 69.00元
互联网时代的数据安全与个人隐私受到前所未有的挑战,各种新奇的攻击技术层出不穷。如何才能更好地保护我们的数据?《白帽子讲Web 安全(纪念版)》将带你走进Web 安全的世界,让你了解Web 安全的方方面面。黑客不再神秘,攻击技术原来如此,小网站也能找到适合自己的安全道路。大公司如何做安全,为什么要选择这样的方案呢?在《白帽子讲Web 安全(纪念版)》中都能找到答案。详细的剖析,让你不仅能“知其然”,......一起来看看 《白帽子讲Web安全(纪念版)》 这本书的介绍吧!