Visualizing Multiple Regression in 3D

栏目: IT技术 · 发布时间: 5年前

内容简介:No matter your exposure to data science & the world of statistics, at the very least, you’ve very likely heard of regression. In this post we’ll be talking about multiple regression, as a precursor, you’ll definitely want some familiarity with simple linea

How Can You Make More Sense of High Dimensional Datasets?

Jul 27 ·6min read

Visualizing Multiple Regression in 3D

Image by Mediamodifier from Pixabay

Introduction

No matter your exposure to data science & the world of statistics, at the very least, you’ve very likely heard of regression. In this post we’ll be talking about multiple regression, as a precursor, you’ll definitely want some familiarity with simple linear regression. If you aren’t familiar you canstart here! Otherwise, let’s dive in with multiple linear regression. I recently spoke about visualizing multiple linear regression with heatmaps, if you’ve already read that post, feel free to jump down to the modeling section of this post where we’ll build our new model and introduce the plotly package and 3 dimensional visualizaiton. If you haven't read it, this is another helpful way to visualizemultiple regression.

Multiple Linear Regression

The distinction we draw between simple linear regression and multiple linear regression is simply the number of explanatory variables that help us understand our dependent variable.

Multiple linear regression is an incredibly popular statistical technique for data scientists and is foundational to a lot of the more complex methodologies used by data scientists.

In mypost on simple linear regression, I gave the example of predicting home prices using a single numeric variable — square footage.

This post is a part of a series of posts where we explore different implementations of linear regression. In a post where we explore the parallel slopes model , we create a model where we predict price using square footage and whether it’s a waterfront property or not. Here we’ll do something similar, but we’ll create our model using multiple numeric inputs.

Let’s Get Modeling

Similar to what we’ve built in the aforementioned posts, we’ll create a linear regression model where we add a new numeric variable.

The dataset we’re working with is a Seattle home prices dataset. The record level of the dataset is by home and details price, square footage, # of beds, # of baths, and so forth.

Through the course of this post, we’ll be trying to explain price through a function of other numeric variables in the dataset.

With that said, let’s dive in. Similar to what we’ve built previously we’re using sqft_living to predict price , only here we'll add another variable: sqft_basement

fit <- lm(price ~  sqft_living + sqft_basement,    
          data = housing)
summary(fit)

Visualizing Multiple Regression in 3D

The inclusion of various numeric explanatory variables to a regression model is simple syntactically as well as mathematically.

Visualization Limitations

While you can technically layer numeric variables one after another into the same model, it can quickly become difficult to visualize and understand.

In the case of our model, we have three separate dimensions we’ll need to be able to assess.

As I mentioned previously, here we will be using plotly 's 3d plotting tools to generate deeper understanding.

Let’s play around with plot_ly !

Let’s first visualize sqft_living and price to familiarize ourselves with the syntax.

plot_ly(data = housing, x = ~sqft_living, y = ~price, opacity = 0.5) %>%
  add_markers()

Visualizing Multiple Regression in 3D

As you can see the syntax isn’t too different from ggplot. First specify the data, then jump into the aesthetics without having to explicitly declare that they’re aesthetics. The above visual is a simple 2 dimensional scatter plot.

Let’s visualize in 3 dimensions!

plot_ly(data = housing, z = ~price, x = ~sqft_living, y = ~bathrooms, opacity = 0.5) %>%
  add_markers()

Visualizing Multiple Regression in 3D

Similar to what we did before, we’ve just moved price to the z-axis and now included sqft_basement . What's fun about this plotting tool is that it's not static, you can click and drag rotating the angle from which you're viewing the plot. Obviously here you're just seeing a screenshot, but get this running on your own machine to experience the full flexibility of plotly . At the moment you run this command in RStudio, your Viewer window will populate with this dragable/moveable visual that lends well to interpreting a dataset of greater dimensions.

Adding a Plane

When moving from two dimensions to three dimensions, things change. If you have background in linear algebra this may resonate. To put it simply, if you have a single dimension, then you have a point. If you have two dimensions you have a line. If you have three dimensions… you have a plane .

Having this in mind, Let’s visualize our multiple linear regression model with a plane.

First things first we need to create a matrix with all possible model inputs as well as the model prediction in each case.

Below I create a vector for our x and our y . We then pass them to the outer function where we declare the operation of passing them both to the linear regression function defined through the fitting of our model.

x <- seq(370, 15000, by = 10)
y <- seq(0, 15000, by = 10)plane <- outer(x, y, function(a, b){fit$coef[1] + 
    fit$coef[2]*a + fit$coef[3]*b})

Now that we have our plane, let’s add it to our visual.

plot_ly(data = housing, z = ~price, x = ~sqft_living, y = ~sqft_basement, opacity = 0.5) %>%
  add_markers() %>%
  add_surface(x = ~x, y = ~y, z = ~plane, showscale = FALSE)

Visualizing Multiple Regression in 3D

Again, you’ve got to jump in and play with plotly yourself.

You’ve done it! You’ve added a plane to your 3D scatter plot that represents the relationship between our regression formula and different inputs of sqft_lot & sqft_basement , but we still have a question… how does this help us?

Have you ever added a regression line to your 2D scatter plot? If so, what was the intention?

You would add a line to your plot to give an indication of what the ‘best fit’ looks like, but it’s also useful to be able to say for a given value of x , we would predict y . The plane gives us exactly that. For given values of x and y , what's z ?

Conclusion

We have done a lot in a short amount of time. Multiple linear regression models can become increasingly complex very quickly. My hope is that adding this functionality to your tools set, you’ll be able to maintain better understanding of the data and models your working with. It’s not incredibly difficult to load a model with every variable we have access to, but it does raise the question of whether it solves our objective. Does it lend the type of understanding that we set out to obtain when we engaged in the modeling process?

In a few short minutes, we’ve covered:

  • Multiple linear regression definition
  • Building a mlr model
  • Visualization/interpretation limitations
  • Using 3D plots and planes to interpret our data and models

If this was helpful, feel free to check out my other posts at datasciencelessons.com . Happy Data Science-ing!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

阿里传

阿里传

波特·埃里斯曼 / 张光磊、吕靖纬、崔玉开 / 中信出版社 / 2015-9-15 / CNY 49.00

你只知道阿里巴巴故事的中国部分,而这本书会完整呈现故事的全部。 波特•埃里斯曼是阿里巴巴创业时期为数不多的外国高管。他于2000~2008年在阿里巴巴担任副总裁,这本书记录了他在阿里巴巴8年的时间里的创业故事、商业经验以及在阿里巴巴和马云、蔡崇信、关明生等阿里巴巴早期团队并肩奋战的故事。 在波特眼中,阿里巴巴的成功经验和模式是可以复制的,阿里巴巴曾经犯过的错误,走过的弯路,我们也可以绕......一起来看看 《阿里传》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

在线进制转换器
在线进制转换器

各进制数互转换器

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器