Easily visualize Scikit-learn models’ decision boundaries

栏目: IT技术 · 发布时间: 5年前

内容简介:A simple utility function to visualize the decision boundaries of Scikit-learn machine learning models/estimators.Scikit-learn is an amazing Python library for working and experimenting with aIt is built with robustness and

A simple utility function to visualize the decision boundaries of Scikit-learn machine learning models/estimators.

Easily visualize Scikit-learn models’ decision boundaries

Image source: Pixabay (Free license)

Introduction

Scikit-learn is an amazing Python library for working and experimenting with a plethora of supervised and unsupervised machine learning (ML) algorithms and associated tools .

It is built with robustness and speed in mind — using NumPy and SciPy methods as much as possible with memory-optimization techniques . Most importantly, the library offers a simple and intuitive API across the board for all kinds of ML estimators — fitting the data, predicting, and examining the model parameters.

Easily visualize Scikit-learn models’ decision boundaries

Image: Scikit-learn estimator illustration

For many classification problems in the domain of supervised ML, we may want to go beyond the numerical prediction (of the class or of the probability) and visualize the actual decision boundary between the classes. This is, of course, particularly suitable for binary classification problems and for a pair of features — the visualization is displayed on a 2-dimensional (2D) plane.

For example, here is a visualization of the decision boundary for a Support Vector Machine (SVM) tutorial from the official Scikit-learn documentation.

Easily visualize Scikit-learn models’ decision boundaries

Image source: Scikit-learn SVM

While Scikit-learn does not offer a ready-made, accessible method for doing that kind of visualization, in this article, we examine a simple piece of Python code to achieve that.

A simple Python function

The full code is given here in my Github Repo on Python machine learning. You are certainly welcome to explore the whole repository for other useful ML tutorials, as well.

Here, we show the docstring for illustrating how this can be used,

Easily visualize Scikit-learn models’ decision boundaries

The docstring for the utility function

You can pass on the model class and the model parameters (specific and unique to each model class) to the function, along with the feature and labels data (as NumPy arrays).

Here the model class denotes the exact Scikit-learn estimator class that you call in to instantiate your ML estimator object. Note that you don’t have to pass on the specific ML estimator that you are working with. Just the class name will suffice. This function will internally fit the data and predict to create the appropriate decision boundary (taking into account the model parameters that you also pass on).

At present, the function uses just the first two columns of the data for fitting the model as we need to find the predicted value for every point in a mesh grid-style scatter plot.

Easily visualize Scikit-learn models’ decision boundaries

Main code section

Some illustrative results

Code is boring, while results (and plots) are exciting, aren’t they?

For the demonstration, we used a divorce classification dataset. This dataset is about participants who completed the personal information form and a divorce predictors scale. The data is a modified version of the publicly available data at the UCI portal (after injecting some noise). There are 170 participants and 54 attributes (or predictor variables) that are all real-valued.

Easily visualize Scikit-learn models’ decision boundaries

UCI divorce predictor dataset

We compared the performance of multiple ML estimators on the same dataset,

  • Naive Bayes
  • Logistic regression
  • K-nearest neighbor (KNN)

Because the binary classes of this particular dataset are fairly easily separable, all the ML algorithms perform almost equally well. However, their respective decision boundary looks different from each other and that is what we are interested in visualizing through this utility function.

Easily visualize Scikit-learn models’ decision boundaries

Image: Class separability of the divorce prediction dataset

Naive Bayes decision boundary

The decision boundary from the Naive Bayes algorithm was smooth and slightly nonlinear . And, with only four lines of code!

Easily visualize Scikit-learn models’ decision boundaries

Logistic regression decision boundary

As expected, the decision boundary from the logistic regression estimator was visualized as a linear separator.

Easily visualize Scikit-learn models’ decision boundaries

K-nearest neighbor (KNN) decision boundary

K-nearest neighbor is an algorithm based on the local geometry of the distribution of the data on the feature hyperplane (and their relative distance measures). The decision boundary, therefore, comes up as nonlinear and non-smooth .

Easily visualize Scikit-learn models’ decision boundaries

You can pass even a neural network classifier

The function works with any Scikit-learn estimator, even a neural network. Here is the decision boundary with the MLPClassifier estimator of Scikit-learn, which models a densely-connected neural network (with user-configurable parameters). Note, in the code, we pass on the hidden layer settings, the learning rate, and the optimizer ( Stochastic Gradient Descent or SGD).

Easily visualize Scikit-learn models’ decision boundaries

Examining the impact of model parameters

As mentioned before, we can pass on any model parameters that we want to the utility function. In the case of the KNN classifier, as we increase the number of neighboring data points, the decision boundary becomes smoother. This can be readily visualized using our utility function. Note, in the code below, how we pass on the variable k to the n_neighbors model parameter inside a loop.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

O2O

O2O

张波 / 机械工业出版社华章公司 / 2013-2-5 / 49.00元

2012年是O2O元年,无论是成熟的传统企业、如火如荼的电子商务企业,还是以电信、银行、娱乐等为代表的与民生相关的企业,都在探索和践行O2O模式,因为O2O中孕育着极富创新性的商业模式。本书是国内首部O2O方面的著作,不仅宏观上叙述了O2O的概念、在各行业的应用情况,以及未来的发展趋势,而且还系统阐述和解读了各行业如何借助O2O来顺利实现商业模式的转型和升级;不仅极富洞察力地分析了O2O在营销、支......一起来看看 《O2O》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具