A complete guide to K-means clustering algorithm
What is K-means Clustering
K-Means is a simple and unsupervised machine learning algorithm, Basically, K-Means clustering is nothing but segregating the given data points of the dataset into multiple groups based on similar characteristics and identifying the underlined patterns, here the cluster refers to a collection of data points aggregated together because of certain similarities.
K-Means clustering comes under unsupervised machine learning because, we won't be having the target values here, instead we group the data points which are similar in nature
Where is the K-means Clustering algorithm used?
There are multiple scenarios in the real world, where K-Mean clustering is being used, some of them are movie recommendations and to identify the type of the customer in any organizations
Without any further delay, let's dive in understanding how K-Means clustering works
How is the K-means clustering algorithm works
Step-1
K-Means clustering will keep few centroids in a random position and calculate perpendicular distance between those points and draws a line
Step-2
All the data points in the dataset that lies towards one side of the line will fall under the first centroid and other data points will fall under the second centroid
Step-3
Now we will calculate the distance between each data point to the centroid and find a center position for those points which will have the least difference and then shift the centroid position there. We are doing this approach because A centroid is the imaginary or real location representing the center of the cluster.
A similar approach will be followed for the other side of the line
Step-4
Now again calculate the perpendicular distance between those centroids and draw a line, see if there are any data points that got shifted from one cluster to another cluster
Step-5
It will continue the above steps 1,2, and 3 iteratively until those clusters got stabilized and no data points are being moved to other clusters even after performing the perpendicular calculations
This is how K-Mean clustering will segregate the data points into different clusters
There is a very popular and powerful library called sklearn which has all the machine learning algorithms, which we can use on the fly by simply importing them.
I have a step-by-step approach to use the K-means clustering in the sklearn library on the iris dataset. you can have a look at the jupyter notebook in my Github link
Comments
Post a Comment