内容简介:Classification is a very important area of machine learning, as it allows you to create categories based on certain characteristics. It is used in a lot of fields nowadays such as marketing, where we can classify visitors of a sales site according to their
Exploring and analysing of the Iris Dataset
May 1 ·8min read
Classification is a very important area of machine learning, as it allows you to create categories based on certain characteristics. It is used in a lot of fields nowadays such as marketing, where we can classify visitors of a sales site according to their appetite to buy. We were just talking about it in a previous article .
In this one we will take a set of flowers and their characteristics. We are going to analyze the different characteristics of these flowers and try to see if they are correlated. This will help us to create a classification model that each time we give the characteristics of a flower we can tell which species it is.
For this analysis we will use a dataset that comes from Kaggle a very famous dataset bank. This one indexes 150 flowers to which one associates a species and some characteristics.
Features Presentation
Let’s see what we can find in this dataset just by using the summary function.
We can see here that we have three different species: Setosa, Versicolor and Virginica with 50 records for each species. We can also see that petal length feature has a very high range compared to the others. If we go deeper, we can say that the high difference between the first quartile and the median could be caused by a species that has some very little petals. Enough speculation! Let’s dive into this dataset together.
Finding a discriminatory feature
It will be interesting to study which characteristic(s) discriminates each species and to what extent.
To do this we will start by analyzing each species in relation to the others, in order to determine if one of the characteristics is not in the average. If so, we will analyse this characteristic more precisely.
Setosa
We’ll begin by analysing if one of the features of the Setosa is far from the mean.
We begin by saving the mean of the features concerning the Setosa species in one variable and another with all flowers that are not Setosa. Then, we can plot it.
We can immediately see that the petal length feature seems to discriminate against the Setosa. Now we’ll only analyse this feature.
Let’s compare the Petal length distribution of Setosa against the all flowers.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
算法分析导论
(美)Robert Sedgewick、(法)Philippe Flajolet / 冯舜玺、李学武、裴伟东、等其他 / 机械工业出版社 / 2006-4 / 38.00元
本书阐述了用于算法数学分析的主要方法,所涉及的材料来自经典数学课题,包括离散数学、初等实分析、组合数学,以及来自经典的计算机科学课题,包括算法和数据结构,本书内容集中覆盖基础、重要和有趣的算法,前面侧重数学,后面集中讨论算法分析的应用,重点的算法分的的数学方法。每章包含大量习题以及参考文献,使读者可以更深入地理解书中的内容。 本书适合作为高等院校数学、计算机科学以及相关专业的本科生和研究生的......一起来看看 《算法分析导论》 这本书的介绍吧!