A complete guide to K-means clustering algorithm

April 05, 2021

What is K-means Clustering

K-Means is a simple and unsupervised machine learning algorithm, Basically, K-Means clustering is nothing but segregating the given data points of the dataset into multiple groups based on similar characteristics and identifying the underlined patterns, here the cluster refers to a collection of data points aggregated together because of certain similarities.

K-Means clustering comes under unsupervised machine learning because, we won't be having the target values here, instead we group the data points which are similar in nature

Where is the K-means Clustering algorithm used?

There are multiple scenarios in the real world, where K-Mean clustering is being used, some of them are movie recommendations and to identify the type of the customer in any organizations

Without any further delay, let's dive in understanding how K-Means clustering works

How is the K-means clustering algorithm works

Step-1

K-Means clustering will keep few centroids in a random position and calculate perpendicular distance between those points and draws a line

Step-2

All the data points in the dataset that lies towards one side of the line will fall under the first centroid and other data points will fall under the second centroid

Step-3

Now we will calculate the distance between each data point to the centroid and find a center position for those points which will have the least difference and then shift the centroid position there. We are doing this approach because A centroid is the imaginary or real location representing the center of the cluster.

A similar approach will be followed for the other side of the line

Step-4

Now again calculate the perpendicular distance between those centroids and draw a line, see if there are any data points that got shifted from one cluster to another cluster

Step-5

It will continue the above steps 1,2, and 3 iteratively until those clusters got stabilized and no data points are being moved to other clusters even after performing the perpendicular calculations

This is how K-Mean clustering will segregate the data points into different clusters

There is a very popular and powerful library called sklearn which has all the machine learning algorithms, which we can use on the fly by simply importing them.

I have a step-by-step approach to use the K-means clustering in the sklearn library on the iris dataset. you can have a look at the jupyter notebook in my Github link

Implementation of K-Means Clustering on Iris Dataset

Search This Blog

Love the Process, Not the Goal

A complete guide to K-means clustering algorithm

Comments

Post a Comment

Popular posts from this blog

What is Exploratory Data Analysis? | Part 1

COMPARABLE VS COMPARATOR