Entry 8 of 11
ML Fundamentals Series
·1 min read

Clustering Without Labels: K-Means, Hierarchical, and How They See the World Differently

Everything up to now has been supervised: models that learn from labeled data. Today I hit the first unsupervised algorithms: clustering. The task is to find groups in data that nobody labeled. No answer key.

K-Means is the most common clustering algorithm. You pick KK (number of clusters) upfront. The algorithm:

  1. Places KK centroids randomly
  2. Assigns each point to its nearest centroid
  3. Moves each centroid to the average of its assigned points
  4. Repeats until centroids stop moving

The problem: K-Means is sensitive to initial centroid placement (hence K-Means++ for smarter initialization), only finds roughly spherical clusters, and you have to know KK in advance. Picking KK uses the elbow method: plot within-cluster sum of squares against KK, pick where the curve bends.

Hierarchical Clustering doesn't need KK upfront. It builds a dendrogram: a tree showing how points merge into clusters step by step:

  • Agglomerative (bottom-up): Start with every point as its own cluster. Merge the two closest. Repeat until one cluster remains.
  • Divisive (top-down): Start with everything in one cluster. Split recursively.

You then "cut" the dendrogram at a height that gives you the number of clusters you want.

What clicked

K-Means is faster and scales better but assumes you know KK and assumes roughly circular clusters. Hierarchical gives more flexibility and a visual picture of structure, but is O(n2)O(n^2) in memory and time: doesn't scale to large datasets.

Still shaky on

How do you evaluate clustering quality when there are no labels? I know about silhouette score and within-cluster sum of squares but haven't worked through what "good" looks like in practice.

What's next

What if the problem isn't grouping but compression: reducing 50 features to 3 while keeping the most important signal? That's PCA.