Multivariate Distributions

Understanding multivariate distributions is essential in data science, particularly when dealing with datasets involving multiple variables. This article delves deeply into the concepts of joint, marginal, and conditional distributions, covariance, correlation in a multivariate context, and the properties and applications of the multivariate normal distribution, with detailed explanations and practical examples.

Joint Distributions

In probability theory, the joint distribution of two or more random variables describes the probability that each of those random variables falls within a particular range or set of values. This concept is fundamental when we want to understand how variables interact with each other.

Joint Probability Density Function (PDF)

For continuous random variables $X$ and $Y$ , the joint distribution is described by the joint probability density function $f_{X,Y}(x,y)$ . This function represents the likelihood of $X$ taking a specific value $x$ and $Y$ taking a specific value $y$ simultaneously.

Example: Joint PDF of Two Continuous Variables

Consider two random variables, $X$ and $Y$ , representing the height and weight of individuals in a population. Suppose their joint PDF is given by:

f_{X,Y}(x,y) = \frac{1}{2\pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)

Where:

$\mu_X$ and $\mu_Y$ are the means of $X$ and $Y$ .
$\sigma_X$ and $\sigma_Y$ are the standard deviations of $X$ and $Y$ .
$\rho$ is the correlation coefficient between $X$ and $Y$ .

Note: This joint PDF assumes that $X$ and $Y$ are jointly normally distributed, describing a bivariate normal distribution.

To find the probability that a person has a height between 160 cm and 170 cm and a weight between 60 kg and 70 kg, we would integrate the joint PDF over these intervals:

P(160 \leq X \leq 170, 60 \leq Y \leq 70) = \int_{160}^{170} \int_{60}^{70} f_{X,Y}(x,y) \,dy \,dx

This integral is often computed numerically, as closed-form solutions can be complex.

Joint Probability Mass Function (PMF)

For discrete random variables, the joint distribution is represented by the joint probability mass function (PMF). It gives the probability that each random variable takes a specific value.

Example: Joint PMF of Two Discrete Variables

Consider a scenario where we roll two dice, and let $X$ and $Y$ represent the outcome of the first and second dice, respectively. The joint PMF is given by:

P(X = x, Y = y) = \frac{1}{36}, \quad x, y \in \{1, 2, 3, 4, 5, 6\}

This uniform distribution reflects that each pair of outcomes (e.g., $(1,1)$ , $(1,2)$ , ..., $(6,6)$ ) is equally likely.

To find the probability that the sum of the two dice equals 7, we sum the probabilities of the relevant pairs:

P(X + Y = 7) = P(X=1, Y=6) + P(X=2, Y=5) + P(X=3, Y=4) + P(X=4, Y=3) + P(X=5, Y=2) + P(X=6, Y=1)

Since each pair has a probability of $\frac{1}{36}$ , the total probability is:

P(X + Y = 7) = 6 \times \frac{1}{36} = \frac{1}{6}

Marginal Distributions

The marginal distribution of a variable in a multivariate distribution is the distribution of that variable ignoring the others. It is obtained by summing or integrating out the other variables from the joint distribution.

Marginal PDF

For continuous variables, the marginal PDF of $X$ is obtained by integrating the joint PDF over all values of $Y$ :

f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \,dy

Example: Marginal Distribution from a Joint PDF

Continuing with the height and weight example, suppose we are only interested in the distribution of height $X$ . To find the marginal PDF of height, we integrate out the weight:

f_X(x) = \int_{-\infty}^{\infty} \frac{1}{2\pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right) dy

This integral simplifies to give us the marginal distribution of height $X$ , which is normally distributed with mean $\mu_X$ and variance $\sigma_X^2$ .

Marginal PMF

For discrete variables, the marginal PMF is obtained by summing the joint PMF over all values of the other variable.

Example: Marginal PMF from a Joint PMF

In the dice example, the marginal PMF of $X$ (the outcome of the first die) is:

P(X = x) = \sum_{y=1}^{6} P(X = x, Y = y) = \sum_{y=1}^{6} \frac{1}{36} = \frac{6}{36} = \frac{1}{6}

This shows that each outcome of the first die is equally likely, as expected.

Conditional Distributions

A conditional distribution gives the distribution of one variable given that another variable takes on a specific value. This is important when we want to understand the dependency between variables.

Conditional PDF

For continuous variables, the conditional PDF of $X$ given $Y=y$ is defined as:

f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}

Where $f_Y(y)$ is the marginal PDF of $Y$ . The conditional distribution tells us how $X$ behaves when $Y$ is fixed.

Example: Conditional Distribution from a Joint PDF

Suppose we know a person’s weight $Y=65$ kg and want to know the distribution of their height $X$ . Using the joint PDF:

f_{X|Y}(x|65) = \frac{f_{X,Y}(x,65)}{f_Y(65)}

This expression gives us the conditional distribution of height given a specific weight.

Conditional PMF

For discrete variables, the conditional PMF of $X$ given $Y=y$ is:

P(X = x | Y = y) = \frac{P(X = x, Y = y)}{P(Y = y)}

Example: Conditional PMF from a Joint PMF

Using the dice example, if we know the second die shows a 5 ( $Y=5$ ), the conditional probability that the first die shows a 3 ( $X=3$ ) is:

P(X=3 | Y=5) = \frac{P(X=3, Y=5)}{P(Y=5)} = \frac{\frac{1}{36}}{\frac{1}{6}} = \frac{1}{6}

Covariance and Correlation

Covariance and correlation measure the relationship between two random variables.

Covariance

Covariance is a measure of how much two random variables vary together. If the variables tend to increase together, the covariance is positive; if one tends to increase while the other decreases, the covariance is negative.

The covariance between $X$ and $Y$ is defined as:

\text{Cov}(X,Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]

Where $\mathbb{E}[X]$ and $\mathbb{E}[Y]$ are the expected values (means) of $X$ and $Y$ .

Example: Covariance Calculation

Let’s calculate the covariance of $X$ and $Y$ , where $X$ represents the number of hours studied and $Y$ represents exam scores. Suppose we have the following data for a sample of students:

Hours Studied ( $X$ )	Exam Score ( $Y$ )
2	50
4	60
6	65
8	80
10	85

First, compute the means:

\mathbb{E}[X] = \frac{2 + 4 + 6 + 8 + 10}{5} = 6

\mathbb{E}[Y] = \frac{50 + 60 + 65 + 80 + 85}{5} = 68

Next, calculate the covariance. Here, we are calculating the population covariance by dividing by $n = 5$ . If this were a sample, we would divide by $n - 1 = 4$ instead.

\text{Cov}(X,Y) = \frac{1}{5} \sum_{i=1}^{5} (X_i - 6)(Y_i - 68)

Substituting the values:

\text{Cov}(X,Y) = \frac{1}{5} [(2-6)(50-68) + (4-6)(60-68) + (6-6)(65-68) + (8-6)(80-68) + (10-6)(85-68)]

\text{Cov}(X,Y) = \frac{1}{5} [(-4)(-18) + (-2)(-8) + (0)(-3) + (2)(12) + (4)(17)]

\text{Cov}(X,Y) = \frac{1}{5} [72 + 16 + 0 + 24 + 68] = \frac{180}{5} = 36

A positive covariance of 36 indicates that hours studied and exam scores tend to increase together.

Calculating Standard Deviations:

To compute the standard deviations:

\sigma_X = \sqrt{\frac{1}{5} \sum_{i=1}^{5} (X_i - 6)^2} = \sqrt{\frac{1}{5} (16 + 4 + 0 + 4 + 16)} = \sqrt{8} \approx 2.83

\sigma_Y = \sqrt{\frac{1}{5} \sum_{i=1}^{5} (Y_i - 68)^2} = \sqrt{\frac{1}{5} (324 + 64 + 9 + 144 + 289)} = \sqrt{\frac{830}{5}} = \sqrt{166} \approx 12.88

(Note: The previously provided $\sigma_Y = 13.22$ was slightly off based on the data. The correct approximation is $\sigma_Y \approx 12.88$ .)

Correlation

Correlation is a normalized version of covariance that provides a measure of the linear relationship between two variables. The correlation coefficient $\rho_{X,Y}$ is defined as:

\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}

Where $\sigma_X$ and $\sigma_Y$ are the standard deviations of $X$ and $Y$ respectively.

Example: Correlation Calculation

Using the covariance calculated above, and the standard deviations $\sigma_X \approx 2.83$ and $\sigma_Y \approx 12.88$ , we have:

\rho_{X,Y} = \frac{36}{2.83 \times 12.88} \approx \frac{36}{36.44} \approx 0.99

A correlation coefficient of approximately 0.99 indicates a very strong positive linear relationship between hours studied and exam scores, meaning that as one increases, the other tends to increase as well.

(Note: There was a minor discrepancy in the previous calculation, which has been corrected here for accuracy.)

Multivariate Normal Distribution

The multivariate normal distribution is a generalization of the normal distribution to multiple variables. It is a cornerstone of multivariate statistics and is used in various applications, such as portfolio optimization and principal component analysis.

Definition

A random vector $\mathbf{X} = (X_1, X_2, \dots, X_n)^T$ follows a multivariate normal distribution if any linear combination of its components follows a univariate normal distribution. The multivariate normal distribution is fully described by its mean vector $\mathbf{\mu}$ and covariance matrix $\mathbf{\Sigma}$ .

The probability density function of a multivariate normal distribution is:

f(\mathbf{X}) = \frac{1}{\sqrt{(2\pi)^k |\mathbf{\Sigma}|}} \exp\left(-\frac{1}{2}(\mathbf{X} - \mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{X} - \mathbf{\mu})\right)

Where:

$\mathbf{X}$ is a $k \times 1$ vector of random variables.
$\mathbf{\mu}$ is a $k \times 1$ mean vector.
$\mathbf{\Sigma}$ is a $k \times k$ covariance matrix.
$|\mathbf{\Sigma}|$ is the determinant of the covariance matrix.

Example: Bivariate Normal Distribution

Consider a bivariate normal distribution with variables $X$ and $Y$ , where:

\mathbf{\mu} = \begin{pmatrix} \mu_X \\ \mu_Y \end{pmatrix} = \begin{pmatrix} 5 \\ 10 \end{pmatrix}, \quad \mathbf{\Sigma} = \begin{pmatrix} 4 & 2 \\ 2 & 9 \end{pmatrix}

The covariance matrix $\mathbf{\Sigma}$ indicates that the variance of $X$ is 4, the variance of $Y$ is 9, and the covariance between $X$ and $Y$ is 2. The joint distribution of $X$ and $Y$ is fully characterized by these parameters.

Properties

Marginal Distributions: Any subset of variables from a multivariate normal distribution is also normally distributed. For example, $X$ and $Y$ individually follow normal distributions with their respective means and variances.
Linear Combinations: Any linear combination of the variables in a multivariate normal distribution is also normally distributed. For instance, if $Z = aX + bY$ , then $Z$ is normally distributed.
Conditional Distributions: The conditional distribution of a subset of variables given the others is also normally distributed. If $Y$ is known, the distribution of $X$ given $Y=y$ is normal.

Applications

Principal Component Analysis (PCA): PCA is often more effective when the data exhibits multivariate normality, as this assumption simplifies the data structure and allows for dimensionality reduction while retaining as much variance as possible. However, PCA can be applied to data with other distributions as well.
Portfolio Theory: In finance, the returns on a portfolio of assets are often modeled using a multivariate normal distribution, which allows for the computation of portfolio risk and return based on the covariances between asset returns.
Gaussian Mixture Models (GMM): GMMs use multiple multivariate normal distributions to model complex data distributions, often applied in clustering and classification tasks.

Conclusion

Multivariate distributions are fundamental in understanding how multiple variables interact and behave together. This in-depth exploration of joint, marginal, and conditional distributions, along with covariance, correlation, and the multivariate normal distribution, equips data scientists with the knowledge needed to model and analyze complex datasets effectively. Through practical examples, we have seen how these concepts are applied, providing a strong foundation for further exploration in multivariate analysis.

Joint Distributions​

Joint Probability Density Function (PDF)​

Example: Joint PDF of Two Continuous Variables​

Joint Probability Mass Function (PMF)​

Example: Joint PMF of Two Discrete Variables​

Marginal Distributions​

Marginal PDF​

Example: Marginal Distribution from a Joint PDF​

Marginal PMF​

Example: Marginal PMF from a Joint PMF​

Conditional Distributions​

Conditional PDF​

Example: Conditional Distribution from a Joint PDF​

Conditional PMF​

Example: Conditional PMF from a Joint PMF​

Covariance and Correlation​

Covariance​

Example: Covariance Calculation​

Correlation​

Example: Correlation Calculation​

Multivariate Normal Distribution​

Definition​

Example: Bivariate Normal Distribution​

Properties​

Applications​

Conclusion​