K-Nearest Neighbors Has No Training Phase: And That's the Whole Point

knn k-nearest-neighbors classification supervised-learning distance-based

Every algorithm I've studied so far learns during training, it adjusts weights, builds trees, finds hyperplanes. KNN (K-Nearest Neighbors) doesn't. There is no training phase. The entire model is just: store all the data, and when a new point comes in, find its $K$ nearest neighbors and vote.

KNN is called a lazy learner because it defers all computation to prediction time. When you ask it to classify a new point, it measures the distance from that point to every training example, finds the $K$ closest ones, and returns whichever class appears most among them.

Distance is usually Euclidean:

$d(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_i (p_i - q_i)^2}$

But you can use other metrics depending on the data type (Manhattan distance, cosine similarity for text, etc.).

The key hyperparameter is $K$ . $K = 1$ means the new point just copies its closest neighbor, which overfits badly. Large $K$ means you're averaging over many neighbors, which can blur important distinctions (underfitting). The right $K$ is found via cross-validation or the elbow method: plot error rate against $K$ , pick where error stops dropping sharply.

What clicked

KNN has no assumptions about the underlying data distribution. Linear regression assumes linearity. Logistic regression assumes linear separability (loosely). KNN makes no such assumption, it just relies on the local structure of the data.

Still shaky on

Prediction is expensive: for $N$ training points, each prediction requires computing $N$ distances. For large datasets this gets slow, and KNN needs all training data in memory at inference time. Makes it impractical at scale, but a strong baseline for smaller datasets.

What's next

Shifting gears into unsupervised territory: algorithms that find structure without any labels at all. Starting with clustering.