Entry 7 of 11
ML Fundamentals Series
·1 min read

K-Nearest Neighbors Has No Training Phase: And That's the Whole Point

Every algorithm I've studied so far learns during training, it adjusts weights, builds trees, finds hyperplanes. KNN (K-Nearest Neighbors) doesn't. There is no training phase. The entire model is just: store all the data, and when a new point comes in, find its KK nearest neighbors and vote.

KNN is called a lazy learner because it defers all computation to prediction time. When you ask it to classify a new point, it measures the distance from that point to every training example, finds the KK closest ones, and returns whichever class appears most among them.

Distance is usually Euclidean:

d(p,q)=i(piqi)2d(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_i (p_i - q_i)^2}

But you can use other metrics depending on the data type (Manhattan distance, cosine similarity for text, etc.).

The key hyperparameter is KK. K=1K = 1 means the new point just copies its closest neighbor, which overfits badly. Large KK means you're averaging over many neighbors, which can blur important distinctions (underfitting). The right KK is found via cross-validation or the elbow method: plot error rate against KK, pick where error stops dropping sharply.

What clicked

KNN has no assumptions about the underlying data distribution. Linear regression assumes linearity. Logistic regression assumes linear separability (loosely). KNN makes no such assumption, it just relies on the local structure of the data.

Still shaky on

Prediction is expensive: for NN training points, each prediction requires computing NN distances. For large datasets this gets slow, and KNN needs all training data in memory at inference time. Makes it impractical at scale, but a strong baseline for smaller datasets.

What's next

Shifting gears into unsupervised territory: algorithms that find structure without any labels at all. Starting with clustering.