Chapter 1 – Python Basic Operations

Machine Learning

Chapter 8: Core Classification Algorithms

Now that we understand a lot of core elements in the background of machine learning and we know how to evaluate the classifiers, it’s now the exciting part exploring the different models themselves. In this chapter we will walk through four of the most commonly used classification algorithms: Logistic Regression, K-Nearest Neighbors (KNN), Decision Trees, and Random Forests. Each algorithm brings its own strengths, limitations, and ideal use cases. We’ll focus on their logic, mechanics, and when to apply them effectively.

8.1 Logistic Regression What It Is and Why It Works:

Logistic regression is one of the most fundamental classification algorithms., despite its name, it’s used for classification, not regression. At its core, logistic regression answers a simple question: How likely is it that this input belongs to class 1 (positive)?

It works by:

Taking a weighted sum of the input features
Passing that sum through a sigmoid function, which converts the result into a value between 0 and 1 (i.e., a probability)
Applying a decision threshold (typically 0.5) to determine the predicted class

Why it works:

The boundary between classes is (roughly) linear
You want a probabilistic interpretation (not just “yes/no”)

Where it shines:

Interpretable models for binary classification
Baseline models in healthcare, finance, and social science
Fast and robust when the number of features is moderate

Python Implementation Example:

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression()
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))  # Accuracy: 0.9737

Bonus Mathematical Equation/Context:

Weighted Sum of inputs Equation:

\\[ z = \mathbf{w}^T \mathbf{x} + b \\]

Sigmoid maps the sum to probability:

\\[ \sigma(z) = \frac{1}{1 + e^{-z}} \\]

Loss function (cross-entropy):

\\[ L = -[y \log(\hat{y}) + (1 - y)\log(1 - \hat{y})] \\]

8.2 K-Nearest Neighbors (KNN) What It Is and Why It Works:

KNN is a non-parametric, example-based classifier. It makes predictions based on memorized data rather than learning weights.
It works by:

When given a new data point, it searches the training set for the k most similar points
It assigns the majority class among these k neighbors as the prediction

Why it works:

KNN relies on a simple and often powerful assumption: Similar things tend to have similar labels.
KNN can model non-linear decision boundaries and adapt to the local shape of data.

Where it shines:

When patterns are complex and local, not easily captured by global rules
Problems with very few features and clear class groupings
Prototype-based reasoning (e.g., medical diagnoses based on past patients)

Where it struggles:

High dimensions (curse of dimensionality)
Large datasets (slow at prediction time)
Features with different scales or irrelevant features

Python Example:

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))  # Accuracy: 0.9561

Mathematical Bonus:
KNN works by calculating the Euclidean distance between points and it classifies different points based on how close they are to a central main class, the class is then calculated based on a main vote.

8.3 Decision Trees What is it and Why it Works:

A decision tree builds a series of if-then rules based on the input features to predict the class of a data point.
How it works:

At each node, it chooses the feature and threshold that best splits the data into pure subgroups
The process repeats recursively, building branches until:

Each leaf is “pure” (all one class), or
A stopping condition is met (max depth, min samples, etc.)

Why it works:

Features are categorical or ordinal
Interpretability is critical
You need to handle missing data or outliers

Where it shines:

Clear, human-interpretable logic
Real-world decision making (credit approval, eligibility)
Datasets with mixed feature types

Where it struggles:

High variance / overfitting
Small training sets (unstable splits)

Python Example:

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(max_depth=4)
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))  # Accuracy: 0.9561

Mathematical Intuition Bonus:
Common split criteria are chosen between Gini-split impurity and Entropy.
The equation for Gini-split impurity is the following:

\\[ G = 1 - \sum p_i^2 \\]

Entropy for Informational Gain:

\\[ H = - \sum p_i \log_2 p_i \\]

You want to choose the split that minimizes impurity or maximizes informational gain.

8.4 Random Forest What is it and Why it Works:

A random forest is an ensemble of decision trees. Instead of relying on one tree (which might overfit), it grows many trees using different random samples and combines their predictions.
How it works:

Bootstrapping: Each tree is trained on a different random subset of data (with replacement)
Random feature selection: Each split considers a random subset of features
Final prediction: majority vote (classification) or average (regression)

Why it works:

Random forests reduce variance and overfitting by averaging many diverse trees
Be accurate and robust
Handle complex, non-linear data
Be more stable than a single decision tree

Where it shines:

Large datasets with many features
Data with noise or complex interactions
Feature importance analysis

Where it struggles:

Interpretability
Training time for very large datasets

Python Example:

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, max_depth=5)
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))  # Accuracy: 0.9737

Mathematical Background:
Bagging: Final Prediction = Majority Vote of h1(x), h2(x), ..., ht(x)
Feature importance is computed by averaging impurity reductions across all trees

8.5 Final Summary Table:

Model	Test Accuracy	Strengths	Weaknesses
Logistic Regression	0.9737	Fast, interpretable, probabilistic output	Linear boundaries only
K-Nearest Neighbors	0.9561	Simple, flexible, no training time	Slow predictions, sensitive to scale
Decision Tree	0.9561	Interpretable, handles missing values	Prone to overfitting
Random Forest	0.9737	Robust, handles non-linear data, high accuracy	Less interpretable, longer training time

Next Chapter →