House Prices Prediction Using Deep Learning

栏目: IT技术 · 发布时间: 4年前

House Prices Prediction Using Deep Learning

Keras-Regression vs Multiple Linear Regression

House Prices Prediction Using Deep Learning

Jul 22 ·6min read

House Prices Prediction Using Deep Learning

Photo by @ Kusseyl on Instagram —KW, Florida

In this tutorial, we’re going to create a model to predict House prices:house_with_garden: based on various factors across different markets.

Problem Statement

The goal of this statistical analysis is to help us understand the relationship between house features and how these variables are used to predict house price.

Objective

  • Predict the house price
  • Using two different models in terms of minimizing the difference between predicted and actual rating

Data used: Kaggle-kc_house Dataset

GitHub:you can find my source code here

Step 1: Exploratory Data Analysis (EDA)

First, Let’s import the data and have a look to see what kind of data we are dealing with:

#import required libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
#import Data
Data = pd.read_csv('kc_house_data.csv')
Data.head(5).T
#get some information about our Data-Set
Data.info()
Data.describe().transpose()

House Prices Prediction Using Deep Learning

5 records of our dataset

House Prices Prediction Using Deep Learning

Information about the dataset, what kind of data types are your variables

House Prices Prediction Using Deep Learning

Statistical summary of your dataset
Our features are:
:heavy_check_mark: Date:

Date house was sold

:heavy_check_mark: Price:

Price is prediction target

:heavy_check_mark: Bedrooms:

Number of Bedrooms/House

:heavy_check_mark: Bathrooms:

Number of bathrooms/House

:heavy_check_mark: Sqft_Living:

square footage of the home

:heavy_check_mark: Sqft_Lot:

square footage of the lot

:heavy_check_mark: Floors:

Total floors (levels) in house

:heavy_check_mark: Waterfront:

House which has a view to a waterfront

:heavy_check_mark: View:

Has been viewed

:heavy_check_mark: Condition:

How good the condition is ( Overall )

:heavy_check_mark: Grade:

grade given to the housing unit, based on King County grading system

:heavy_check_mark: Sqft_Above:

square footage of house apart from basement

:heavy_check_mark: Sqft_Basement:

square footage of the basement

:heavy_check_mark: Yr_Built:

Built Year

:heavy_check_mark: Yr_Renovated:

Year when house was renovated

:heavy_check_mark: Zipcode:

Zip

:heavy_check_mark: Lat:

Latitude coordinate

:heavy_check_mark: Long:

Longitude coordinate

:heavy_check_mark: Sqft_Living15:

Living room area in 2015(implies — some renovations)

:heavy_check_mark: Sqft_Lot15: lotSize area in 2015(implies — some renovations)

Let’s plot couple of features to get a better feel of the data

#visualizing house prices
fig = plt.figure(figsize=(10,7))
fig.add_subplot(2,1,1)
sns.distplot(Data['price'])
fig.add_subplot(2,1,2)
sns.boxplot(Data['price'])
plt.tight_layout()
#visualizing square footage of (home,lot,above and basement)
fig = plt.figure(figsize=(16,5))
fig.add_subplot(2,2,1)
sns.scatterplot(Data['sqft_above'], Data['price'])
fig.add_subplot(2,2,2)
sns.scatterplot(Data['sqft_lot'],Data['price'])
fig.add_subplot(2,2,3)
sns.scatterplot(Data['sqft_living'],Data['price'])
fig.add_subplot(2,2,4)
sns.scatterplot(Data['sqft_basement'],Data['price'])
#visualizing bedrooms,bathrooms,floors,grade
fig = plt.figure(figsize=(15,7))
fig.add_subplot(2,2,1)
sns.countplot(Data['bedrooms'])
fig.add_subplot(2,2,2)
sns.countplot(Data['floors'])
fig.add_subplot(2,2,3)
sns.countplot(Data['bathrooms'])
fig.add_subplot(2,2,4)
sns.countplot(Data['grade'])
plt.tight_layout()

With distribution plot of price, we can visualize that most of the prices are between 0 and around 1M with few outliers close to 8 million (fancy houses:wink:). It would make sense to drop those outliers in our analysis.

House Prices Prediction Using Deep Learning

House price Prediction

It is quit useful to have a quick overview of different features distribution vs house price.

House Prices Prediction Using Deep Learning

Scatterplot — square footage of (home,lot,above and basement)

House Prices Prediction Using Deep Learning

Countplot — bedrooms,bathrooms,floors,grade

Here, I’m breaking the date columns down to years and months to see how is the house price is changing.

#let's break date to years, months
Data['date'] = pd.to_datetime(Data['date'])
Data['month'] = Data['date'].apply(lambda date:date.month)
Data['year'] = Data['date'].apply(lambda date:date.year)
#data visualization house price vs months and years
fig = plt.figure(figsize=(16,5))
fig.add_subplot(1,2,1)
Data.groupby('month').mean()['price'].plot()
fig.add_subplot(1,2,2)
Data.groupby('year').mean()['price'].plot()

House Prices Prediction Using Deep Learning

House price vs months and years

Let’s check if we have a Null Data and also drop some columns that we do not need.

# check if there are any Null values
Data.isnull().sum()
# drop some unnecessary columns
Data = Data.drop('date',axis=1)
Data = Data.drop('id',axis=1)
Data = Data.drop('zipcode',axis=1)

Step 2: Train Test Split and Scaling

Data is divided into the Train set and Test set. We use the Train set to make the algorithm learn the data’s behavior and then check the accuracy of our model on the Test set.

X
y
X = Data.drop('price',axis =1).values
y = Data['price'].values
#splitting Train and Test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=101)

Feature scaling will help us see all the variables from the same lens (same scale), it will also help our models learn faster.

#standardization scaler - fit&transform on train, fit only on test
from sklearn.preprocessing import StandardScaler
s_scaler = StandardScaler()
X_train = s_scaler.fit_transform(X_train.astype(np.float))
X_test = s_scaler.transform(X_test.astype(np.float))

Step 3: Model Selection and Evaluation

:bulb:Model 1:Multiple Linear Regressions

Multiple Linear Regression is an extension of Simple Linear Regression (read more here ) and assume that there is a linear relationship between a dependent variable Y and independent variables X

Let’s wrap the training process in our Regression model:

# Multiple Liner Regression
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
#evaluate the model (intercept and slope)
print(regressor.intercept_)
print(regressor.coef_)
#predicting the test set result
y_pred = regressor.predict(X_test)
#put results as a DataFrame
coeff_df = pd.DataFrame(regressor.coef_, Data.drop('price',axis =1).columns, columns=['Coefficient'])
coeff_df

by visualizing the residual we can see that is normally distributed (proof of having linear relationship with the dependent variable)

# visualizing residuals
fig = plt.figure(figsize=(10,5))
residuals = (y_test- y_pred)
sns.distplot(residuals)

House Prices Prediction Using Deep Learning

Residual visualization

Let’s compare actual output and predicted value to measure how far our predictions are from the real house prices.

#compare actual output values with predicted values
y_pred = regressor.predict(X_test)
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df1 = df.head(10)
df1
# evaluate the performance of the algorithm (MAE - MSE - RMSE)
from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('VarScore:',metrics.explained_variance_score(y_test,y_pred))

House Prices Prediction Using Deep Learning

Multiple Linear Regression Results

:bulb:Model 2: Keras Regressions

Let’s create a baseline neural network model for the regression problem. Starting with all of the needed functions and objects.

# Creating a Neural Network Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam

Since we have 19 features, let’s insert 19 neurons as a start, 4 hidden layers and 1 output layer due to predict house Price.

Also, ADAM optimization algorithm is used for optimizing loss function (Mean squared error)

# having 19 neuron is based on the number of available features
model = Sequential()
model.add(Dense(19,activation='relu'))
model.add(Dense(19,activation='relu'))
model.add(Dense(19,activation='relu'))
model.add(Dense(19,activation='relu'))
model.add(Dense(1))
model.compile(optimizer='Adam',loss='mes')

Then, we train the model for 400 epochs, and each time record the training and validation accuracy in the history object. To keep track of how well the model is performing for each epoch, the model will run in both train and test data along with calculating the loss function.

model.fit(x=X_train,y=y_train,
          validation_data=(X_test,y_test),
          batch_size=128,epochs=400)model.summary()

House Prices Prediction Using Deep Learning

loss_df = pd.DataFrame(model.history.history)
loss_df.plot(figsize=(12,8))

House Prices Prediction Using Deep Learning

Evaluation on Test Data

y_pred = model.predict(X_test)from sklearn import metricsprint('MAE:', metrics.mean_absolute_error(y_test, y_pred))  
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('VarScore:',metrics.explained_variance_score(y_test,y_pred))# Visualizing Our predictions
fig = plt.figure(figsize=(10,5))
plt.scatter(y_test,y_pred)
# Perfect predictions
plt.plot(y_test,y_test,'r')

House Prices Prediction Using Deep Learning

# visualizing residuals
fig = plt.figure(figsize=(10,5))
residuals = (y_test- y_pred)
sns.distplot(residuals)

House Prices Prediction Using Deep Learning

Keras Regression vs Multiple Linear Regression!

We made it!:muscle:

we have predicted the house price using two different ML model algorithms.

The score of our Multiple Linear Regression is around 69%, so this model had room for improvement. Then we got an accuracy of ~81% with Keras Regression model .

Also, notice that RMSE (loss function) is lower for Keras Regression model which shows that our prediction is closer to actual rating price.

House Prices Prediction Using Deep Learning

Results: Keras Reg. vs Multiple Linear Reg.

Without surprise, this score can be improved through feature selection or using other regression models.

Thank you for reading . Again feedback is always welcome!

House Prices Prediction Using Deep Learning

很遗憾的说,推酷将在这个月底关闭。人生海海,几度秋凉,感谢那些有你的时光。


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Java in a Nutshell, 6th Edition

Java in a Nutshell, 6th Edition

Benjamin J Evans、David Flanagan / O'Reilly Media / 2014-10 / USD 59.99

The latest edition of Java in a Nutshell is designed to help experienced Java programmers get the most out of Java 7 and 8, but it's also a learning path for new developers. Chock full of examples tha......一起来看看 《Java in a Nutshell, 6th Edition》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具