内容简介:Previously, I had explained the various Regression models such as Linear, Polynomial and Support Vector Regression. In this article, I will walk you through the Algorithm and Implementation of Decision Tree Regression with a real-world example.Decision Tre
Implement the Decision Tree Regression algorithm and plot the results.
Jul 14 ·5min read
Previously, I had explained the various Regression models such as Linear, Polynomial and Support Vector Regression. In this article, I will walk you through the Algorithm and Implementation of Decision Tree Regression with a real-world example.
Overview of Decision Tree Algorithm
Decision Tree is one of the most commonly used, practical approaches for supervised learning. It can be used to solve both Regression and Classification tasks with the latter being put more into practical application.
It is a tree-structured classifier with three types of nodes. The Root Node is the initial node which represents the entire sample and may get split further into further nodes. The Interior Nodes represent the features of a data set and the branches represent the decision rules. Finally, the Leaf Nodes represent the outcome. This algorithm is very useful for solving decision-related problems.
With a particular data point, it is run completely through the entirely tree by answering True/False questions till it reaches the leaf node. The final prediction is the average of the value of the dependent variable in that particular leaf node. Through multiple iterations, the Tree is able to predict a proper value for the data point.
The above diagram is a representation for the implementation of a Decision Tree algorithm. Decision trees have an advantage that it is easy to understand, lesser data cleaning is required, non-linearity does not affect the model’s performance and the number of hyper-parameters to be tuned is almost null. However, it may have an over-fitting problem, which can be resolved using the Random Forest algorithm which will be explained in the next article.
In this example, we will go through the implementation of Decision Tree Regression , in which we will predict the revenue of an ice cream shop based on the temperature in an area for 500 days.
Problem Analysis
In this data, we have one independent variable Temperature and one independent variable Revenue which we have to predict. In this problem, we have to build a Decision Tree Regression Model which will study the correlation between the Temperature and Revenue of the Ice Cream Shop and predict the revenue for the ice cream shop based on the temperature on a particular day.
Step 1: Importing the libraries
The first step will always consist of importing the libraries that are needed to develop the ML model. The NumPy , matplotlib and the Pandas libraries are imported.
import numpy as np import matplotlib.pyplot as plt import pandas as pd
Step 2: Importing the dataset
In this step, we shall use pandas to store the data obtained from my github repository and store it as a Pandas DataFrame using the function ‘ pd.read_csv ’. In this, we assign the independent variable (X) to the ‘ Temperature’ column and the dependent variable (y) to the ‘ Revenue’ column.
dataset = pd.read_csv('https://raw.githubusercontent.com/mk-gurucharan/Regression/master/IceCreamData.csv')X = dataset['Temperature'].values
y = dataset['Revenue'].valuesdataset.head(5)>>Temperature Revenue
24.566884 534.799028
26.005191 625.190122
27.790554 660.632289
20.595335 487.706960
11.503498 316.240194
Step 3: Splitting the dataset into the Training set and Test set
In the next step, we have to split the dataset as usual into the training set
and the test set
. For this we use test_size=0.05
which means that 5% of 500 data rows ( 25 rows
) will only be used as test set and the remaining 475 rows
will be used as training set for building the model.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.05)
Step 4: Training the Decision Tree Regression model on the training set
We import the DecisionTreeRegressor
class from sklearn.tree
and assign it to the variable ‘
regressor’
. Then we fit the X_train and the y_train to the model by using the regressor.fit
function. We use the reshape(-1,1)
to reshape our variables to a single column vector.
# Fitting Decision Tree Regression to the dataset from sklearn.tree import DecisionTreeRegressor regressor = DecisionTreeRegressor() regressor.fit(X_train.reshape(-1,1), y_train.reshape(-1,1))
Step 5: Predicting the Results
In this step, we predict the results of the test set with the model trained on the training set values using the regressor.predict
function and assign it to ‘
y_pred’
.
y_pred = regressor.predict(X_test.reshape(-1,1))
Step 6: Comparing the Real Values with Predicted Values
In this step, we shall compare and display the values of y_test as ‘ Real Values ’ and y_pred as ‘ Predicted Values ’ in a Pandas dataframe.
df = pd.DataFrame({'Real Values':y_test.reshape(-1), 'Predicted Values':y_pred.reshape(-1)}) df>> Real Values Predicted Values 448.325981 425.265596 535.866729 500.065779 264.123914 237.763911 691.855484 698.971806 587.221246 571.434257 653.986736 633.504009 538.179684 530.748225 643.944327 660.632289 771.789537 797.566536 644.488633 654.197406 192.341996 223.435016 491.430500 477.295054 781.983795 807.541287 432.819795 420.966453 623.598861 612.803770 599.364914 534.799028 856.303304 850.246982 583.084449 596.236690 521.775445 503.084268 228.901030 258.286810 453.785607 473.568112 406.516091 450.473207 562.792463 634.121978 642.349814 621.189730 737.800824 733.215828
From the above values, we infer that the model is able to predict the values of the y_test with a good accuracy.
Step 7: Visualising the Decision Tree Regression Results
以上所述就是小编给大家介绍的《Machine Learning Basics: Decision Tree Regression》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
新内容创业:我这样打造爆款IP
南立新、曲琳 / 机械工业出版社 / 2016-5-10 / 39.00
这是个内容创业爆棚的时代,在采访几十家内容创业公司,与一线最优秀的创业者独家对话之后,作者写作了这本书,其中包括对这个行业的真诚感触,以及希望沉淀下来的体系化思考。 本书共分三个部分讲述了爆红大号的内容创业模式和方法。其中第一部分,讲述了新的生产方式,即内容形态发展的现状--正在被塑造;第二部分,讲述了新的盈利探索,即从贩卖产品到贩卖内容的转变,该部分以多个案例进行佐证,内容翔实;第三部分,......一起来看看 《新内容创业:我这样打造爆款IP》 这本书的介绍吧!