Chapter 4: Underfitting/Overfitting and Bias-Variance Tradeoff

In previous chapters we learned about how to train machine learning models by minimizing the loss function. Although having a trained model is important that doesn’t mean that it is perfect ~ Just because the model works on the training data does not mean that it will work on new data that it encounters. In the real world many models that work well on the training data often fail in the real world. Why is this? Well there are two main culprits: Underfitting and Overfitting.

4.1 What is Underfitting and Overfitting?

Underfitting means the model is too simple ~ Not enough features

Overfitting means the model is too complex ~ Too many features

Finding the right balance is one of the most important challenges in machine learning and it all ties into a fundamental concept called the bias-variance tradeoff.

4.2 Underfitting and Overfitting: The Two Extremes

Underfitting

  • Occurs when a model is too simple to capture the underlying structure in the data.
  • High training error
  • High test error
  • Example: Fitting a straight line to a quadratic curve
  • Symptoms: Poor performance on both train and test sets and model is not expressive enough

Overfitting

  • Occurs when a model is too complex and learns not only the underlying structure but also the noise in the training data.
  • Low training error
  • High test error
  • Example: Fitting a 10th-degree polynomial to 5 data points
  • Symptoms:
    • Excellent performance on training data
    • Poor generalization to new data
    • Sensitive to small fluctuations or outliers

4.3 Bias-Variance Tradeoff

The goal of your model should always be to find that sweet spot of complexity that finds the data’s structure without memorizing all of the noise.

Bias Variance Tradeoff is the central concept that explains why models underfit or overfit

Bias: is the error due to wrong assumptions in the learning algorithm such as using a linear model for non linear data.

Variance: Is the error due to model sensitivity to small fluctuations in the training set.

  • High bias leads to underfitting because the model is too rigid making it unable to capture patterns
  • High variance leads to overfitting because the model is too flexible, fitting noise and anomalies in the data.

4.4 Reducing Bias and Variance

To reduce bias

  • Use a more complex model
  • Add more features
  • Reduce regularization
  • Train longer

To reduce variance

  • Use a simpler model
  • Add more training data
  • Use regularization
  • Use cross-validation

Note: You can’t minimize both at once—improving one usually worsens the other.

4.5 Diagnosing the Issue

  • If your training error is high then you are underfitting
  • If your training error is low you may be overfitting
  • The test error may be high if you are overfitting or underfitting
  • If your train and test errors are the same then that is a sign of underfitting

4.6 Regularization: A Sneak Peek

One common way to combat overfitting is through regularization—a method that penalizes model complexity. You'll learn about this in the next chapter.

4.7 Real-World Examples

Underfitting:

  • Predicting housing prices using only the number of bedrooms.
  • Using logistic regression for image classification on handwritten digits.

Overfitting:

  • A model that memorizes all user ratings but fails to recommend movies for new users.
  • A deep neural network trained on only 500 images.