How I build a classification model with R

栏目: IT技术 · 发布时间: 4年前

内容简介：Classification is a very important area of machine learning, as it allows you to create categories based on certain characteristics. It is used in a lot of fields nowadays such as marketing, where we can classify visitors of a sales site according to their

Exploring and analysing of the Iris Dataset

Martin Decombarieu

May 1 ·8min read

How I build a classification model with R

Classification is a very important area of machine learning, as it allows you to create categories based on certain characteristics. It is used in a lot of fields nowadays such as marketing, where we can classify visitors of a sales site according to their appetite to buy. We were just talking about it in a previous article .

In this one we will take a set of flowers and their characteristics. We are going to analyze the different characteristics of these flowers and try to see if they are correlated. This will help us to create a classification model that each time we give the characteristics of a flower we can tell which species it is.

For this analysis we will use a dataset that comes from Kaggle a very famous dataset bank. This one indexes 150 flowers to which one associates a species and some characteristics.

Features Presentation

Let’s see what we can find in this dataset just by using the summary function.

We can see here that we have three different species: Setosa, Versicolor and Virginica with 50 records for each species. We can also see that petal length feature has a very high range compared to the others. If we go deeper, we can say that the high difference between the first quartile and the median could be caused by a species that has some very little petals. Enough speculation! Let’s dive into this dataset together.

Finding a discriminatory feature

It will be interesting to study which characteristic(s) discriminates each species and to what extent.

To do this we will start by analyzing each species in relation to the others, in order to determine if one of the characteristics is not in the average. If so, we will analyse this characteristic more precisely.

Setosa

We’ll begin by analysing if one of the features of the Setosa is far from the mean.

We begin by saving the mean of the features concerning the Setosa species in one variable and another with all flowers that are not Setosa. Then, we can plot it.

We can immediately see that the petal length feature seems to discriminate against the Setosa. Now we’ll only analyse this feature.

Let’s compare the Petal length distribution of Setosa against the all flowers.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

How I build a classification model with R

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

ASP.NET揭秘

Stephen Walther、汤涛 / 汤涛 / 中国电力出版社 / 2004-8-1 / 95.00元

本书是美国亚马逊网站同类书长期销售冠军，并受到微软ASP.NET小组项目经理Rob Howard的大力推荐，中文版由中科院专家汤涛老师翻译，经典、权威是本书最好的诠释。本书共分10部分，31章，囊括了在.NET框架下架建ASP.NET应用程序的各个层面。每一章也都不是泛泛而谈理论，而是围绕实际样例代码来组织，让读者马上可以上手，并且加深理解。书中还包含了两个完整的、立即就可以用得......一起来看看《ASP.NET揭秘》这本书的介绍吧!

码农工具