Principal Component Analysis (PCA)

Learn how PCA reduces dimensionality while preserving variance in data

intermediate35 min

Principal Component Analysis (PCA)

Introduction

Principal Component Analysis (PCA) is a powerful dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving as much variance (information) as possible. It's one of the most widely used techniques in data science and machine learning.

Why Use PCA?

The Curse of Dimensionality

As the number of features increases:

  • Data becomes sparse in high-dimensional space
  • Distance metrics become less meaningful
  • Computational costs increase dramatically
  • Visualization becomes impossible

Benefits of PCA

  1. Dimensionality Reduction: Reduce hundreds or thousands of features to just a few
  2. Noise Reduction: Minor components often represent noise
  3. Visualization: Project high-dimensional data to 2D or 3D for plotting
  4. Feature Extraction: Create new features that capture most variance
  5. Computational Efficiency: Faster training with fewer features

How PCA Works

PCA VisualizationPCA finds principal components that capture maximum variance in the data

Step 1: Standardization

First, standardize the features to have zero mean and unit variance:

x_standardized = (x - mean) / std

This ensures features with larger scales don't dominate the analysis.

Step 2: Covariance Matrix

Compute the covariance matrix to understand how features vary together:

Cov(X, Y) = E[(X - μ_X)(Y - μ_Y)]

Step 3: Eigenvalue Decomposition

Find eigenvalues and eigenvectors of the covariance matrix:

  • Eigenvectors: Define the directions of principal components
  • Eigenvalues: Indicate the amount of variance along each component

Step 4: Select Components

Sort components by eigenvalue (descending) and select top k components.

Step 5: Transform Data

Project original data onto the selected principal components.

Interpreting Results

PCA Scree PlotScree plot showing explained variance by each principal component

Explained Variance Ratio

Each component explains a certain percentage of total variance:

  • PC1 typically explains the most (e.g., 40%)
  • PC2 explains the next most (e.g., 25%)
  • Later components explain progressively less

Scree Plot

A scree plot shows explained variance for each component:

  • Look for an "elbow" where variance drops sharply
  • Components before the elbow are usually kept
  • Aim for 80-95% cumulative variance

Loadings

PCA BiplotPCA biplot showing both data points and feature loadings

Component loadings show how original features contribute to each PC:

  • High positive loading: feature increases with PC
  • High negative loading: feature decreases with PC
  • Near-zero loading: feature doesn't contribute much

When to Use PCA

Good Use Cases

  • Visualization: Plot high-dimensional data in 2D/3D
  • Preprocessing: Before clustering or classification
  • Noise Reduction: Remove minor components
  • Feature Engineering: Create new composite features
  • Compression: Reduce storage requirements

Limitations

  • Linear Relationships: PCA only captures linear relationships
  • Interpretability: Principal components are combinations of original features
  • Variance ≠ Information: High variance doesn't always mean important
  • Outliers: Sensitive to outliers in the data

Practical Tips

  1. Always Standardize: Unless features are already on the same scale
  2. Check Scree Plot: Don't just pick an arbitrary number of components
  3. Validate Results: Check if downstream tasks improve
  4. Consider Alternatives: t-SNE for visualization, autoencoders for non-linear reduction
  5. Preserve Enough Variance: Typically 80-95% cumulative variance

Example Applications

Image Compression

  • Original: 64x64 image = 4,096 features
  • PCA: Reduce to 50 components
  • Result: 98% compression with minimal quality loss

Face Recognition

  • Eigenfaces: PCA on face images
  • Each component captures facial features
  • Efficient face matching in low-dimensional space

Gene Expression Analysis

  • Thousands of genes measured
  • PCA reveals patterns and clusters
  • Identify key genes driving variation

Mathematical Foundation

Optimization Objective

PCA finds directions that maximize variance:

maximize: Var(Xw)
subject to: ||w|| = 1

Where w is the direction vector (eigenvector).

Relationship to SVD

PCA is closely related to Singular Value Decomposition (SVD):

X = UΣV^T

The columns of V are the principal components.

Summary

PCA is a fundamental technique for:

  • Reducing dimensionality
  • Visualizing high-dimensional data
  • Extracting important features
  • Preprocessing for other algorithms

Understanding PCA provides a foundation for more advanced dimensionality reduction techniques like t-SNE, UMAP, and autoencoders.

Sign in to Continue

Sign in with Google to save your learning progress, quiz scores, and bookmarks across devices.

Track your progress across all modules
Save quiz scores and bookmarks
Sync learning data across devices