Chapter 8: Core Classification Algorithms
Now that we understand a lot of core elements in the background of machine learning and we know how to evaluate the classifiers, it’s now the exciting part exploring the different models themselves. In this chapter we will walk through four of the most commonly used classification algorithms: Logistic Regression, K-Nearest Neighbors (KNN), Decision Trees, and Random Forests. Each algorithm brings its own strengths, limitations, and ideal use cases. We’ll focus on their logic, mechanics, and when to apply them effectively.
8.1 Logistic Regression What It Is and Why It Works:
Logistic regression is one of the most fundamental classification algorithms., despite its name, it’s used for classification, not regression. At its core, logistic regression answers a simple question: How likely is it that this input belongs to class 1 (positive)?
It works by:
- Taking a weighted sum of the input features
- Passing that sum through a sigmoid function, which converts the result into a value between 0 and 1 (i.e., a probability)
- Applying a decision threshold (typically 0.5) to determine the predicted class
Why it works:
- The boundary between classes is (roughly) linear
- You want a probabilistic interpretation (not just “yes/no”)
Where it shines:
- Interpretable models for binary classification
- Baseline models in healthcare, finance, and social science
- Fast and robust when the number of features is moderate
Python Implementation Example:
from sklearn.datasets import load_breast_cancer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler data = load_breast_cancer() X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) model = LogisticRegression() model.fit(X_train, y_train) print("Accuracy:", model.score(X_test, y_test)) # Accuracy: 0.9737
Bonus Mathematical Equation/Context:
Weighted Sum of inputs Equation:
\\[ z = \mathbf{w}^T \mathbf{x} + b \\]
Sigmoid maps the sum to probability:
\\[ \sigma(z) = \frac{1}{1 + e^{-z}} \\]
Loss function (cross-entropy):
\\[ L = -[y \log(\hat{y}) + (1 - y)\log(1 - \hat{y})] \\]
8.2 K-Nearest Neighbors (KNN) What It Is and Why It Works:
KNN is a non-parametric, example-based classifier. It makes predictions based on memorized data rather than learning weights.
It works by:
- When given a new data point, it searches the training set for the k most similar points
- It assigns the majority class among these k neighbors as the prediction
Why it works:
- KNN relies on a simple and often powerful assumption: Similar things tend to have similar labels.
- KNN can model non-linear decision boundaries and adapt to the local shape of data.
Where it shines:
- When patterns are complex and local, not easily captured by global rules
- Problems with very few features and clear class groupings
- Prototype-based reasoning (e.g., medical diagnoses based on past patients)
Where it struggles:
- High dimensions (curse of dimensionality)
- Large datasets (slow at prediction time)
- Features with different scales or irrelevant features
Python Example:
from sklearn.neighbors import KNeighborsClassifier model = KNeighborsClassifier(n_neighbors=5) model.fit(X_train, y_train) print("Accuracy:", model.score(X_test, y_test)) # Accuracy: 0.9561
Mathematical Bonus:
KNN works by calculating the Euclidean distance between points and it classifies different points based on how close they are to a central main class, the class is then calculated based on a main vote.
8.3 Decision Trees What is it and Why it Works:
A decision tree builds a series of if-then rules based on the input features to predict the class of a data point.
How it works:
- At each node, it chooses the feature and threshold that best splits the data into pure subgroups
- The process repeats recursively, building branches until:
- Each leaf is “pure” (all one class), or
- A stopping condition is met (max depth, min samples, etc.)
Why it works:
- Features are categorical or ordinal
- Interpretability is critical
- You need to handle missing data or outliers
Where it shines:
- Clear, human-interpretable logic
- Real-world decision making (credit approval, eligibility)
- Datasets with mixed feature types
Where it struggles:
- High variance / overfitting
- Small training sets (unstable splits)
Python Example:
from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier(max_depth=4) model.fit(X_train, y_train) print("Accuracy:", model.score(X_test, y_test)) # Accuracy: 0.9561
Mathematical Intuition Bonus:
Common split criteria are chosen between Gini-split impurity and Entropy.
The equation for Gini-split impurity is the following:
\\[ G = 1 - \sum p_i^2 \\]
Entropy for Informational Gain:
\\[ H = - \sum p_i \log_2 p_i \\]
You want to choose the split that minimizes impurity or maximizes informational gain.
8.4 Random Forest What is it and Why it Works:
A random forest is an ensemble of decision trees. Instead of relying on one tree (which might overfit), it grows many trees using different random samples and combines their predictions.
How it works:
- Bootstrapping: Each tree is trained on a different random subset of data (with replacement)
- Random feature selection: Each split considers a random subset of features
- Final prediction: majority vote (classification) or average (regression)
Why it works:
- Random forests reduce variance and overfitting by averaging many diverse trees
- Be accurate and robust
- Handle complex, non-linear data
- Be more stable than a single decision tree
Where it shines:
- Large datasets with many features
- Data with noise or complex interactions
- Feature importance analysis
Where it struggles:
- Interpretability
- Training time for very large datasets
Python Example:
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(n_estimators=100, max_depth=5) model.fit(X_train, y_train) print("Accuracy:", model.score(X_test, y_test)) # Accuracy: 0.9737
Mathematical Background:
Bagging: Final Prediction = Majority Vote of h1(x), h2(x), ..., ht(x)
Feature importance is computed by averaging impurity reductions across all trees
8.5 Final Summary Table:
Model | Test Accuracy | Strengths | Weaknesses |
---|---|---|---|
Logistic Regression | 0.9737 | Fast, interpretable, probabilistic output | Linear boundaries only |
K-Nearest Neighbors | 0.9561 | Simple, flexible, no training time | Slow predictions, sensitive to scale |
Decision Tree | 0.9561 | Interpretable, handles missing values | Prone to overfitting |
Random Forest | 0.9737 | Robust, handles non-linear data, high accuracy | Less interpretable, longer training time |