Entry 1 of 11
ML Fundamentals Series
·1 min read

The Map Before the Territory: How ML Splits Into Supervised and Unsupervised

Before you can understand any specific ML algorithm, you need to understand why ML splits into two completely different philosophies, and what that distinction actually means for how models learn.

Supervised learning is the simpler mental model: you have data where every input already has a correct answer attached. The model learns by looking at those (input, answer) pairs and figuring out the mapping. Two types: classification (the output is a category: spam/not spam, cat/dog) and regression (the output is a number: house price, temperature tomorrow). The algorithm sees your labels and optimizes toward them.

Unsupervised learning is different in a fundamental way: there are no labels. No correct answers. The model has to find structure in the data on its own. Three types: clustering (group similar things together), association rule learning (find patterns like "people who buy X also buy Y"), and dimensionality reduction (compress features while keeping the important signal).

The algorithm zoo maps cleanly onto this split. For supervised: Linear Regression, Logistic Regression, Decision Trees, Random Forests, SVM, KNN, Naive Bayes, Gradient Boosting. For unsupervised: K-Means, Hierarchical Clustering, DBSCAN, Apriori, PCA, NMF.

What clicked

Supervised learning is like studying with an answer key. Unsupervised is like being given a pile of documents in a language you don't know and being asked to sort them into topics. You might find meaningful groups. You might not. There's no external truth to check against.

Still shaky on

"Unsupervised" sounds like the model isn't doing real work. Actually it's often harder, the model has no feedback signal to optimize against, so the quality of what it finds is harder to even define, let alone measure.

What's next

Over the next few days I'm going through most of these algorithms one by one. Starting with the simplest supervised model: Linear Regression.