How I build a classification model with R

栏目: IT技术 · 发布时间: 5年前

内容简介:Classification is a very important area of machine learning, as it allows you to create categories based on certain characteristics. It is used in a lot of fields nowadays such as marketing, where we can classify visitors of a sales site according to their

Exploring and analysing of the Iris Dataset

How I build a classification model with R

Classification is a very important area of machine learning, as it allows you to create categories based on certain characteristics. It is used in a lot of fields nowadays such as marketing, where we can classify visitors of a sales site according to their appetite to buy. We were just talking about it in a previous article .

In this one we will take a set of flowers and their characteristics. We are going to analyze the different characteristics of these flowers and try to see if they are correlated. This will help us to create a classification model that each time we give the characteristics of a flower we can tell which species it is.

How I build a classification model with R

For this analysis we will use a dataset that comes from Kaggle a very famous dataset bank. This one indexes 150 flowers to which one associates a species and some characteristics.

Features Presentation

Let’s see what we can find in this dataset just by using the summary function.

How I build a classification model with R

We can see here that we have three different species: Setosa, Versicolor and Virginica with 50 records for each species. We can also see that petal length feature has a very high range compared to the others. If we go deeper, we can say that the high difference between the first quartile and the median could be caused by a species that has some very little petals. Enough speculation! Let’s dive into this dataset together.

Finding a discriminatory feature

It will be interesting to study which characteristic(s) discriminates each species and to what extent.

To do this we will start by analyzing each species in relation to the others, in order to determine if one of the characteristics is not in the average. If so, we will analyse this characteristic more precisely.

Setosa

We’ll begin by analysing if one of the features of the Setosa is far from the mean.

How I build a classification model with R

We begin by saving the mean of the features concerning the Setosa species in one variable and another with all flowers that are not Setosa. Then, we can plot it.

How I build a classification model with R

How I build a classification model with R

We can immediately see that the petal length feature seems to discriminate against the Setosa. Now we’ll only analyse this feature.

Let’s compare the Petal length distribution of Setosa against the all flowers.

How I build a classification model with R


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

伏牛传

伏牛传

张天一 / 机械工业出版社 / 2016-5 / 39.00元

编辑推荐: 伏牛堂创始人张天一独家揭秘 社群品牌运营背后的规律和逻辑 90后创业者张天一白手起家,在伏牛堂创立一年之际,已是京城大众点评口碑最佳湖南牛肉粉店、获得四轮数千万投资,他是如何做到的? 餐饮品牌伏牛堂如何建设20万人的青年人生活社群 “霸蛮社”,并快速成为知名品牌? 内容推荐: 《伏牛传:一个社群品牌的内部运营笔记》是一本餐饮社群品牌的内部运营笔记,90后创......一起来看看 《伏牛传》 这本书的介绍吧!

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具