Skip to main content

Clustering

Clustering is an unsupervised learning technique used to group similar data points into clusters without predefined labels. It helps in identifying patterns and structures in data.

Types of Clustering Algorithms

  1. Partition-Based Clustering

    • K-Means: Divides data into K clusters by minimizing intra-cluster variance.
    • K-Medoids: Similar to K-Means but uses actual data points as cluster centers.
  2. Hierarchical Clustering

    • Agglomerative: Starts with individual points and merges them iteratively.
    • Divisive: Starts with all points in one cluster and splits them iteratively.
  3. Density-Based Clustering

    • DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Forms clusters based on dense regions of data.
    • OPTICS (Ordering Points to Identify Clustering Structure): An extension of DBSCAN with varying density.
  4. Model-Based Clustering

    • Gaussian Mixture Models (GMM): Assumes data is generated from multiple Gaussian distributions.
  5. Graph-Based Clustering

    • Spectral Clustering: Uses graph Laplacian matrices to cluster data points.

Key Applications

  • Customer segmentation
  • Anomaly detection
  • Image segmentation
  • Topic modeling in NLP
  • Social network analysis