KMeans
K-Means is a popular unsupervised machine learning algorithm used for clustering data points into groups. It minimizes the variance within clusters and works by iteratively refining cluster assignments.
How K-Means Works
- Choose K (number of clusters)
- The user specifies the number of clusters (K).
- Initialize Centroids
- Randomly select K data points as the initial centroids.
- Assign Data Points to Clusters
- Each point is assigned to the closest centroid based on Euclidean distance.
- Update Centroids
- Compute the new centroid of each cluster by averaging the points in that cluster.
- Repeat Until Convergence
- The process repeats until centroids no longer change significantly or a stopping criterion is met.
Mathematical Formula (Objective Function)
The goal is to minimize the within-cluster sum of squares (WCSS):

Choosing K: The Elbow Method
To determine the optimal K, use the Elbow Method:
- Plot WCSS for different K values
- Look for an "elbow" where WCSS stops decreasing significantly
Advantages
✔ Simple and fast ✔ Works well with large datasets ✔ Scales linearly with number of samples
Disadvantages
❌ Needs pre-defined K ❌ Sensitive to outliers ❌ May converge to local optima
Would you like a Python example to implement K-Means using sklearn
? 🚀
K-Means Clustering in Machine Learning
K-means is an iterative algorithm that splits a dataset into non-overlapping subgroups that are called clusters.
...
https://serokell.io/blog/k-means-clustering-in-machine-learning
K means Clustering – Introduction
K-means clustering is a technique used to organize data into groups based on their similarity. For example online store uses K-Means to group customers based on purchase frequency and spending creating segments like Budget Shoppers, Frequent Buyers and Big Spenders for personalised marketing.
...
https://www.geeksforgeeks.org/k-means-clustering-introduction/