Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) is a powerful technique in statistics and machine learning used for both classification and dimensionality reduction. Unlike PCA, which focuses on maximizing variance without considering class labels, LDA seeks to find a linear combination of features that best separates two or more classes. This article delves into the mathematical foundations of LDA, providing detailed explanations and examples to help you understand how it works and why it’s effective.
1. Introduction to Linear Discriminant Analysis
1.1 What is LDA?
Linear Discriminant Analysis is a supervised learning algorithm primarily used for classification and dimensionality reduction. The goal of LDA is to project the data onto a lower-dimensional space where the separation between classes is maximized.
1.2 Comparison with PCA
While Principal Component Analysis (PCA) focuses on maximizing variance in the data without considering any class labels, LDA takes class labels into account and tries to find the axes that maximize the separation between multiple classes. In essence, LDA is a linear technique that seeks to find a projection that maximizes the distance between means of different classes while minimizing the scatter within each class.
2. Mathematical Foundation of LDA
2.1 The Concept of Discriminants
Given a dataset with samples, where each sample belongs to one of classes, LDA seeks to find a linear combination of the input features that separates the classes as much as possible. This can be achieved by finding the linear discriminants that maximize the ratio of between-class variance to within-class variance.
2.2 Between-Class and Within-Class Variance
2.2.1 Within-Class Scatter Matrix ()
The within-class scatter matrix is a measure of how much each class spreads out around its own mean. For a given class , the scatter matrix is defined as:
Where:
- is the number of samples in class .
- is the -th sample in class .
- is the mean vector of class .
2.2.2 Between-Class Scatter Matrix ()
The between-class scatter matrix measures how much the class means deviate from the overall mean. It is defined as:
Where:
- is the overall mean vector of all samples across classes.
2.3 Objective Function of LDA
The objective of LDA is to maximize the ratio of the between-class scatter to the within-class scatter:
Where:
- is the vector that defines the linear combination of features.
This ratio is known as the Fisher criterion. The vector that maximizes this ratio gives us the best linear discriminant for separating the classes.
2.4 Solving the LDA Optimization Problem
The optimization problem can be solved by finding the eigenvectors and eigenvalues of the matrix . The eigenvectors corresponding to the largest eigenvalues form the axes that maximize class separability.
To find these eigenvectors, solve the following generalized eigenvalue problem:
Where represents the eigenvalues.
The solution involves the following steps:
- Compute the scatter matrices and .
- Solve the generalized eigenvalue problem to find the eigenvectors and eigenvalues.
- Select the top eigenvectors corresponding to the largest eigenvalues to form the linear discriminants.
3. Example: Applying LDA to a Simple Dataset
3.1 Constructing the Dataset
Consider a simple dataset with two classes. Each class is represented by a set of points in a two-dimensional space.
3.2 Calculating Class Means
Compute the mean vectors for each class:
3.3 Calculating Scatter Matrices
Compute the within-class scatter matrix:
Compute the between-class scatter matrix: