Naive Bayes Theory
Naive Bayes is a probabilistic classification algorithm based on Bayes' Theorem. It assumes that the features are conditionally independent given the class label, which greatly simplifies the computation of probabilities and makes the algorithm highly efficient. Despite this "naive" assumption, Naive Bayes performs surprisingly well in many real-world scenarios.
In this article, we will cover:
- How Bayes' Theorem works.
- The conditional independence assumption.
- Different types of Naive Bayes classifiers.
- Mathematical derivations and the training process.
1. Bayes' Theorem
Bayes' Theorem provides a way to calculate the posterior probability of a hypothesis (in this case, a class label) based on prior knowledge and evidence (the features). The theorem is expressed as:
Where:
- is the posterior probability of class given the feature set .
- is the likelihood: the probability of observing the feature set given the class .
- is the prior probability of class : the probability of that class occurring before seeing the data.
- is the evidence: the overall probability of observing the feature set across all classes.
The goal of Naive Bayes is to find the class with the highest posterior probability , which can be computed by applying Bayes' Theorem.
2. The Naive Independence Assumption
The term "naive" in Naive Bayes comes from the assumption that all features in the dataset are conditionally independent of each other, given the class label. Mathematically, this means:
Where are the individual features of the dataset.
While this assumption is often unrealistic in practice (since features are usually correlated), it simplifies the computation of the likelihood significantly. This makes Naive Bayes highly scalable and fast, even for large datasets with many features.
3. Naive Bayes Classifiers
There are three main types of Naive Bayes classifiers, each designed to handle different types of data:
3.1. Gaussian Naive Bayes
Gaussian Naive Bayes is used when the features are continuous and are assumed to follow a Gaussian (normal) distribution. The likelihood for a continuous feature given class is calculated using the Gaussian probability density function (PDF):