Hierarchical Clustering: Agglomerative and Divisive — Explained

栏目: IT技术 · 发布时间: 5年前

内容简介:Hierarchical clustering is a method of cluster analysis that is used to cluster similar data points together. Hierarchical clustering follows either the top-down or bottom-up method of clustering.Clustering is an unsupervised machine learning technique tha

Hierarchical Clustering: Agglomerative and Divisive — Explained

An overview of agglomeration and divisive clustering algorithms and their implementation

Aug 2 ·5min read

Hierarchical Clustering: Agglomerative and Divisive — Explained

Photo by Lukas Blazek on Unsplash

Hierarchical clustering is a method of cluster analysis that is used to cluster similar data points together. Hierarchical clustering follows either the top-down or bottom-up method of clustering.

What is Clustering?

Clustering is an unsupervised machine learning technique that divides the population into several clusters such that data points in the same cluster are more similar and data points in different clusters are dissimilar.

  • Points in the same cluster are closer to each other.
  • Points in the different clusters are far apart.

Hierarchical Clustering: Agglomerative and Divisive — Explained

(Image by Author), Sample 2-dimension Dataset

In the above sample 2-dimension dataset, it is visible that the dataset forms 3 clusters that are far apart, and points in the same cluster are close to each other.

There are several types of clustering algorithms other than Hierarchical clusterings, such as k-Means clustering, DBSCAN, and many more. Read the below article to understand what is k-means clustering and how to implement it.

In this article, you can understand hierarchical clustering, its types.

There are two types of hierarchical clustering methods:

  1. Divisive Clustering
  2. Agglomerative Clustering

Divisive Clustering:

The divisive clustering algorithm is a top-down clustering approach, initially, all the points in the dataset belong to one cluster and split is performed recursively as one moves down the hierarchy.

Steps of Divisive Clustering:

  1. Initially, all points in the dataset belong to one single cluster.
  2. Partition the cluster into two least similar cluster
  3. Proceed recursively to form new clusters until the desired number of clusters is obtained.

Hierarchical Clustering: Agglomerative and Divisive — Explained

(Image by Author), 1st Image: All the data points belong to one cluster, 2nd Image: 1 cluster is separated from the previous single cluster, 3rd Image: Further 1 cluster is separated from the previous set of clusters.

In the above sample dataset, it is observed that there is 3 cluster that is far separated from each other. So we stopped after getting 3 clusters.

Even if start separating further more clusters, below is the obtained result.

Hierarchical Clustering: Agglomerative and Divisive — Explained

(Image by Author), Sample dataset separated into 4 clusters

Hierarchical Clustering: Agglomerative and Divisive — Explained

How to choose which cluster to split?

Check the sum of squared errors of each cluster and choose the one with the largest value. In the below 2-dimension dataset, currently, the data points are separated into 2 clusters, for further separating it to form the 3rd cluster find the sum of squared errors (SSE) for each of the points in a red cluster and blue cluster.

Hierarchical Clustering: Agglomerative and Divisive — Explained

(Image by Author), Sample dataset separated into 2clusters

The cluster with the largest SSE value is separated into 2 clusters, hence forming a new cluster. In the above image, it is observed red cluster has larger SSE so it is separated into 2 clusters forming 3 total clusters.

How to split the above-chosen cluster?

Once we have decided to split which cluster, then the question arises on how to split the chosen cluster into 2 clusters. One way is to use Ward’s criterion to chase for the largest reduction in the difference in the SSE criterion as a result of the split.

How to handle the noise or outlier?

Due to the presence of outlier or noise, can result to form a new cluster of its own. To handle the noise in the dataset using a threshold to determine the termination criterion that means do not generate clusters that are too small.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

UNIX网络编程 卷1:套接字联网API(第3版)

UNIX网络编程 卷1:套接字联网API(第3版)

[美]W. 理查德•史蒂文斯(W. Richard Stevens)、比尔• 芬纳(Bill Fenner)、安德鲁 M. 鲁道夫(Andrew M. Rudoff) / 匿名 / 人民邮电出版社 / 2014-6-1 / 129.00

《UNIX环境高级编程(第3版)》是被誉为UNIX编程“圣经”的Advanced Programming in the UNIX Environment一书的第3版。在本书第2版出版后的8年中,UNIX行业发生了巨大的变化,特别是影响UNIX编程接口的有关标准变化很大。本书在保持前一版风格的基础上,根据最新的标准对内容进行了修订和增补,反映了最新的技术发展。书中除了介绍UNIX文件和目录、标准I/......一起来看看 《UNIX网络编程 卷1:套接字联网API(第3版)》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

URL 编码/解码
URL 编码/解码

URL 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具