MSE Penalizes Big Mistakes Harder: R² Tells You If Your Model Even Learned Anything

mse r-squared regression evaluation-metrics math

You've trained a regression model. It outputs numbers. How do you know if those numbers are any good? Two metrics cover this from different angles: MSE measures the size of your errors, $R^2$ measures whether your model learned anything at all.

Mean Squared Error: for each prediction, compute the difference between actual and predicted value, square it, average across all predictions:

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

The squaring does two things: negatives and positives don't cancel, and large errors get penalized disproportionately (error of 10 → squared error of 100; error of 1 → 1). This makes MSE sensitive to outliers. RMSE (root of MSE) has the same properties but units match your original variable, which is easier to interpret.

$R^2$ (coefficient of determination) represents goodness of fit on a scale from 0 to 1:

$R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{total}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$

$R^2 = 1$ means perfect fit. $R^2 = 0$ means the model does no better than always predicting the mean. $R^2 < 0$ is possible, it means the model is actively worse than the mean baseline.

What clicked

$R^2$ is a relative measure, it tells you how much better your model is compared to the laziest possible baseline (always predict the mean). MSE tells you the absolute magnitude of errors. You need both.

Still shaky on

$R^2$ never decreases when you add more features, even useless ones. Add 10 random noise columns and $R^2$ goes up. That's why Adjusted $R^2$ exists, it penalizes added features that don't improve the model. I haven't gone deep on this yet.

What's next

The equivalent story for classification, the confusion matrix and why accuracy alone will mislead you.