Chapter 11: Neural Networks

Neural networks power the modern breakthroughs that have happened with AI from chat bots like ChatGPT, to image recognition software like Google Photos, and many more applications where the computer is actively learning more and more from the inputs it is given. Although these tools sound complicated they are built on a surprisingly simple foundation. They're structured systems of simple mathematical operations that mimic how we believe the brain processes information. The basic idea: take in numbers (features), apply mathematical transformations (layers), and output a prediction. In this chapter we will walk through how they work step by step following this order: The Perceptron, multi-layered networks, activation functions, forward and backward propagation, and we will end it with how the neural network is trained.

11.1 The Perceptron: A Single Neuron

What it is: The perceptron is the simplest form of a neural network. It is essentially a function that takes in several inputs, multiplies them by corresponding weights, adds them together, and decides whether the result is big enough to fire (output 1) or not (output 0).

Why it works: It models basic decision-making. For instance, imagine deciding whether to go hiking:

  • Is it sunny? (1 if yes, 0 if no)
  • Do I have time? (1 or 0)
  • Do I have gear? (1 or 0)

A perceptron can weigh each input based on its importance and make a decision using a simple threshold.

Limitations: Perceptrons can only draw straight-line boundaries between classes. They can't handle tasks like XOR or curved separations.

11.2 Multi-layer Neural Networks: Learning Complexity

What it is: When we stack perceptrons into layers, we get what's called a feedforward neural network or multi-layer perceptron (MLP).

  • Input layer: Takes in raw features.
  • Hidden layers: Intermediate layers that transform data. The more hidden layers, the deeper the network.
  • Output layer: Produces the final prediction (class probability, number, etc.).

Each layer transforms the input space in increasingly abstract ways.

Why it works: Real-world data is messy and non-linear. Multi-layer networks allow the model to learn not just raw features, but combinations and interactions between features, which is essential for tasks like image recognition or language understanding.

11.3 Activation Functions: Adding Non-Linearity

What it is: Activation functions decide how much signal to pass through each neuron. Without them, a neural network would just be one big linear function.

Types:

  • ReLU (Rectified Linear Unit): Passes through positive values, outputs 0 for negatives. Very efficient.
  • Sigmoid: Squashes inputs to a 0–1 range. Useful for probabilities.
  • Tanh: Squashes to –1 to 1 range. Used less often today.

Why it works: They let the network learn curves and boundaries in data that can’t be captured with straight lines.

11.4 Forward and Backward Propagation: Learning in Action

Forward propagation is the process of feeding input data through the network to get an output prediction.

Backward propagation (backprop) is how the network learns:

  • It measures how wrong the output was using a loss function.
  • It works backwards from the output to each layer to calculate how much each weight contributed to the error.
  • It uses this information to make small adjustments to the weights (via gradient descent).

Why it works: This process lets the model improve gradually by taking tiny steps in the direction that most reduces the error.

11.5 Training a Neural Network: Putting It Together

Process:

  • Initialize weights randomly
  • Perform forward propagation on a batch of data
  • Compute the error using a loss function (e.g., binary cross-entropy)
  • Perform backpropagation to compute gradients
  • Update weights using gradient descent
  • Repeat over many epochs until loss stabilizes

11.6 Summary: Why Neural Networks Matter

  • Neural networks are flexible, layered models capable of learning complex patterns in data.
  • They build from basic decision-making units (perceptrons) into deep architectures.
  • Non-linear activation functions and backpropagation enable deep learning.
  • With the right data and tuning, they can solve tasks from spam filtering to voice assistants to self-driving cars.