Chapter 9: Dimensionality Reduction with Principal Component Analysis

In machine learning adding more features seems like it might give you more power, but as we learned with regularization that is not always the case, having a model that is too complex could actually hurt your models performance, this is what some people refer to as the curse of dimensionality. In this chapter we will begin to explore why high-dimensional data is challenging, how Principal Component Analysis (PCA) reduces dimensions, when and how to use PCA in practice, and the core math behind PCA (optional bonus section)

9.1 Curse of Dimensionality

What It Is and Why It Matters

The space becomes sparse—most data points are far apart
Similarity metrics (like Euclidean distance) lose meaning
Algorithms like KNN, clustering, and even tree splits perform worse

Think of trying to find patterns in 2D now imagine trying to find those same patterns in 200 dimensions. The space grows exponentially, and your data becomes scattered like dust.

Real Problems It Causes in the Model:

Overfitting: Too many features + too little data
Slower training: More calculations, more memory
Decreased model performance: Signal-to-noise ratio drops

9.2 Principal Component Analysis (PCA)

What It Is and Why It Works

PCA is a technique to reduce the number of features in your dataset while preserving as much of the variance (information) as possible.

Finding new axes (called principal components) that best describe the spread of your data
Rewriting the data in terms of these components
Keeping only the top components that explain most of the variance

Why it works:

Most real-world data lives in a lower-dimensional subspace. For example, although your data might have 50 features, maybe only 3 or 4 combinations of those features actually explain the important structure.

PCA helps you:

Remove noise and redundancy
Visualize high-dimensional data in 2D or 3D
Speed up training and reduce overfitting

When to Use PCA:

You have many correlated features
You want to visualize data in 2D or 3D
You’re preparing for modeling and want to reduce dimensionality

9.3 Interpreting PCA Results

Explained variance ratio tells you how much information each component keeps. E.g., [0.92, 0.05] means the first component captures 92% of the variance.

Components are combinations of original features

Use only as many components as needed to explain ~95% of variance

9.4 Mini PCA Walkthrough: NBA Dataset

We’ll walk through PCA using the NBA player dataset. We’ll focus on numerical columns like height, weight, points, and shooting percentages.

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Load and inspect dataset
nba = pd.read_csv("nba_data.csv")
X = nba[['player_height', 'pts', 'reb', 'ast', 'fg_pct', 'fg3_pct', 'ft_pct']]

# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Explained variance
print("Explained variance ratio:", pca.explained_variance_ratio_)

# Add to original df for plotting (optional)
nba['PC1'] = X_pca[:, 0]
nba['PC2'] = X_pca[:, 1]

This reduces our 7 original features to just 2 components while still keeping most of the original variance (often above 85%). This is helpful when plotting or clustering players.

9.5 Under the Hood of PCA

PCA relies heavily on linear algebra techniques such as eigenvectors and eigen decomposition in order to find the directions (vectors) in feature space along which the data varies the most. We will not go over the specific math here because it is very complicated and you will not need to know it unless you are searching for a job in the research scientist or quantitative finance field. As you can see above, as long as we have a conceptual understanding of PCA and know the syntax of the function we can utilize it without knowing the math because scikit-learn is there to do all the math while we extrapolate the results.

Next Chapter →