Skip to main content

Multivariate Distributions

Understanding multivariate distributions is essential in data science, particularly when dealing with datasets involving multiple variables. This article delves deeply into the concepts of joint, marginal, and conditional distributions, covariance, correlation in a multivariate context, and the properties and applications of the multivariate normal distribution, with detailed explanations and practical examples.

Joint Distributions

In probability theory, the joint distribution of two or more random variables describes the probability that each of those random variables falls within a particular range or set of values. This concept is fundamental when we want to understand how variables interact with each other.

Joint Probability Density Function (PDF)

For continuous random variables XX and YY, the joint distribution is described by the joint probability density function fX,Y(x,y)f_{X,Y}(x,y). This function represents the likelihood of XX taking a specific value xx and YY taking a specific value yy simultaneously.

Example: Joint PDF of Two Continuous Variables

Consider two random variables, XX and YY, representing the height and weight of individuals in a population. Suppose their joint PDF is given by:

fX,Y(x,y)=12πσXσY1ρ2exp(12(1ρ2)[(xμX)2σX22ρ(xμX)(yμY)σXσY+(yμY)2σY2])f_{X,Y}(x,y) = \frac{1}{2\pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)

Where:

  • μX\mu_X and μY\mu_Y are the means of XX and YY.
  • σX\sigma_X and σY\sigma_Y are the standard deviations of XX and YY.
  • ρ\rho is the correlation coefficient between XX and YY.

Note: This joint PDF assumes that XX and YY are jointly normally distributed, describing a bivariate normal distribution.

To find the probability that a person has a height between 160 cm and 170 cm and a weight between 60 kg and 70 kg, we would integrate the joint PDF over these intervals:

P(160X170,60Y70)=1601706070fX,Y(x,y)dydxP(160 \leq X \leq 170, 60 \leq Y \leq 70) = \int_{160}^{170} \int_{60}^{70} f_{X,Y}(x,y) \,dy \,dx

This integral is often computed numerically, as closed-form solutions can be complex.

Joint Probability Mass Function (PMF)

For discrete random variables, the joint distribution is represented by the joint probability mass function (PMF). It gives the probability that each random variable takes a specific value.

Example: Joint PMF of Two Discrete Variables

Consider a scenario where we roll two dice, and let XX and YY represent the outcome of the first and second dice, respectively. The joint PMF is given by:

P(X=x,Y=y)=136,x,y{1,2,3,4,5,6}P(X = x, Y = y) = \frac{1}{36}, \quad x, y \in \{1, 2, 3, 4, 5, 6\}

This uniform distribution reflects that each pair of outcomes (e.g., (1,1)(1,1), (1,2)(1,2), ..., (6,6)(6,6)) is equally likely.

To find the probability that the sum of the two dice equals 7, we sum the probabilities of the relevant pairs:

P(X+Y=7)=P(X=1,Y=6)+P(X=2,Y=5)+P(X=3,Y=4)+P(X=4,Y=3)+P(X=5,Y=2)+P(X=6,Y=1)P(X + Y = 7) = P(X=1, Y=6) + P(X=2, Y=5) + P(X=3, Y=4) + P(X=4, Y=3) + P(X=5, Y=2) + P(X=6, Y=1)

Since each pair has a probability of 136\frac{1}{36}, the total probability is:

P(X+Y=7)=6×136=16P(X + Y = 7) = 6 \times \frac{1}{36} = \frac{1}{6}

Marginal Distributions

The marginal distribution of a variable in a multivariate distribution is the distribution of that variable ignoring the others. It is obtained by summing or integrating out the other variables from the joint distribution.

Marginal PDF

For continuous variables, the marginal PDF of XX is obtained by integrating the joint PDF over all values of YY:

fX(x)=fX,Y(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \,dy

Example: Marginal Distribution from a Joint PDF

Continuing with the height and weight example, suppose we are only interested in the distribution of height XX. To find the marginal PDF of height, we integrate out the weight:

fX(x)=12πσXσY1ρ2exp(12(1ρ2)[(xμX)2σX22ρ(xμX)(yμY)σXσY+(yμY)2σY2])dyf_X(x) = \int_{-\infty}^{\infty} \frac{1}{2\pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right) dy

This integral simplifies to give us the marginal distribution of height XX, which is normally distributed with mean μX\mu_X and variance σX2\sigma_X^2.

Marginal PMF

For discrete variables, the marginal PMF is obtained by summing the joint PMF over all values of the other variable.

Example: Marginal PMF from a Joint PMF

In the dice example, the marginal PMF of XX (the outcome of the first die) is:

P(X=x)=y=16P(X=x,Y=y)=y=16136=636=16P(X = x) = \sum_{y=1}^{6} P(X = x, Y = y) = \sum_{y=1}^{6} \frac{1}{36} = \frac{6}{36} = \frac{1}{6}

This shows that each outcome of the first die is equally likely, as expected.

Conditional Distributions

A conditional distribution gives the distribution of one variable given that another variable takes on a specific value. This is important when we want to understand the dependency between variables.

Conditional PDF

For continuous variables, the conditional PDF of XX given Y=yY=y is defined as:

fXY(xy)=fX,Y(x,y)fY(y)f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}

Where fY(y)f_Y(y) is the marginal PDF of YY. The conditional distribution tells us how XX behaves when YY is fixed.

Example: Conditional Distribution from a Joint PDF

Suppose we know a person’s weight Y=65Y=65 kg and want to know the distribution of their height XX. Using the joint PDF:

fXY(x65)=fX,Y(x,65)fY(65)f_{X|Y}(x|65) = \frac{f_{X,Y}(x,65)}{f_Y(65)}

This expression gives us the conditional distribution of height given a specific weight.

Conditional PMF

For discrete variables, the conditional PMF of XX given Y=yY=y is:

P(X=xY=y)=P(X=x,Y=y)P(Y=y)P(X = x | Y = y) = \frac{P(X = x, Y = y)}{P(Y = y)}

Example: Conditional PMF from a Joint PMF

Using the dice example, if we know the second die shows a 5 (Y=5Y=5), the conditional probability that the first die shows a 3 (X=3X=3) is:

P(X=3Y=5)=P(X=3,Y=5)P(Y=5)=13616=16P(X=3 | Y=5) = \frac{P(X=3, Y=5)}{P(Y=5)} = \frac{\frac{1}{36}}{\frac{1}{6}} = \frac{1}{6}

Covariance and Correlation

Covariance and correlation measure the relationship between two random variables.

Covariance

Covariance is a measure of how much two random variables vary together. If the variables tend to increase together, the covariance is positive; if one tends to increase while the other decreases, the covariance is negative.

The covariance between XX and YY is defined as:

Cov(X,Y)=E[(XE[X])(YE[Y])]\text{Cov}(X,Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]

Where E[X]\mathbb{E}[X] and E[Y]\mathbb{E}[Y] are the expected values (means) of XX and YY.

Example: Covariance Calculation

Let’s calculate the covariance of XX and YY, where XX represents the number of hours studied and YY represents exam scores. Suppose we have the following data for a sample of students:

Hours Studied (XX)Exam Score (YY)
250
460
665
880
1085

First, compute the means:

E[X]=2+4+6+8+105=6\mathbb{E}[X] = \frac{2 + 4 + 6 + 8 + 10}{5} = 6 E[Y]=50+60+65+80+855=68\mathbb{E}[Y] = \frac{50 + 60 + 65 + 80 + 85}{5} = 68

Next, calculate the covariance. Here, we are calculating the population covariance by dividing by n=5n = 5. If this were a sample, we would divide by n1=4n - 1 = 4 instead.

Cov(X,Y)=15i=15(Xi6)(Yi68)\text{Cov}(X,Y) = \frac{1}{5} \sum_{i=1}^{5} (X_i - 6)(Y_i - 68)

Substituting the values:

Cov(X,Y)=15[(26)(5068)+(46)(6068)+(66)(6568)+(86)(8068)+(106)(8568)]\text{Cov}(X,Y) = \frac{1}{5} [(2-6)(50-68) + (4-6)(60-68) + (6-6)(65-68) + (8-6)(80-68) + (10-6)(85-68)] Cov(X,Y)=15[(4)(18)+(2)(8)+(0)(3)+(2)(12)+(4)(17)]\text{Cov}(X,Y) = \frac{1}{5} [(-4)(-18) + (-2)(-8) + (0)(-3) + (2)(12) + (4)(17)] Cov(X,Y)=15[72+16+0+24+68]=1805=36\text{Cov}(X,Y) = \frac{1}{5} [72 + 16 + 0 + 24 + 68] = \frac{180}{5} = 36

A positive covariance of 36 indicates that hours studied and exam scores tend to increase together.

Calculating Standard Deviations:

To compute the standard deviations:

σX=15i=15(Xi6)2=15(16+4+0+4+16)=82.83\sigma_X = \sqrt{\frac{1}{5} \sum_{i=1}^{5} (X_i - 6)^2} = \sqrt{\frac{1}{5} (16 + 4 + 0 + 4 + 16)} = \sqrt{8} \approx 2.83 σY=15i=15(Yi68)2=15(324+64+9+144+289)=8305=16612.88\sigma_Y = \sqrt{\frac{1}{5} \sum_{i=1}^{5} (Y_i - 68)^2} = \sqrt{\frac{1}{5} (324 + 64 + 9 + 144 + 289)} = \sqrt{\frac{830}{5}} = \sqrt{166} \approx 12.88

(Note: The previously provided σY=13.22\sigma_Y = 13.22 was slightly off based on the data. The correct approximation is σY12.88\sigma_Y \approx 12.88.)

Correlation

Correlation is a normalized version of covariance that provides a measure of the linear relationship between two variables. The correlation coefficient ρX,Y\rho_{X,Y} is defined as:

ρX,Y=Cov(X,Y)σXσY\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}

Where σX\sigma_X and σY\sigma_Y are the standard deviations of XX and YY respectively.

Example: Correlation Calculation

Using the covariance calculated above, and the standard deviations σX2.83\sigma_X \approx 2.83 and σY12.88\sigma_Y \approx 12.88, we have:

ρX,Y=362.83×12.883636.440.99\rho_{X,Y} = \frac{36}{2.83 \times 12.88} \approx \frac{36}{36.44} \approx 0.99

A correlation coefficient of approximately 0.99 indicates a very strong positive linear relationship between hours studied and exam scores, meaning that as one increases, the other tends to increase as well.

(Note: There was a minor discrepancy in the previous calculation, which has been corrected here for accuracy.)

Multivariate Normal Distribution

The multivariate normal distribution is a generalization of the normal distribution to multiple variables. It is a cornerstone of multivariate statistics and is used in various applications, such as portfolio optimization and principal component analysis.

Definition

A random vector X=(X1,X2,,Xn)T\mathbf{X} = (X_1, X_2, \dots, X_n)^T follows a multivariate normal distribution if any linear combination of its components follows a univariate normal distribution. The multivariate normal distribution is fully described by its mean vector μ\mathbf{\mu} and covariance matrix Σ\mathbf{\Sigma}.

The probability density function of a multivariate normal distribution is:

f(X)=1(2π)kΣexp(12(Xμ)TΣ1(Xμ))f(\mathbf{X}) = \frac{1}{\sqrt{(2\pi)^k |\mathbf{\Sigma}|}} \exp\left(-\frac{1}{2}(\mathbf{X} - \mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{X} - \mathbf{\mu})\right)

Where:

  • X\mathbf{X} is a k×1k \times 1 vector of random variables.
  • μ\mathbf{\mu} is a k×1k \times 1 mean vector.
  • Σ\mathbf{\Sigma} is a k×kk \times k covariance matrix.
  • Σ|\mathbf{\Sigma}| is the determinant of the covariance matrix.

Example: Bivariate Normal Distribution

Consider a bivariate normal distribution with variables XX and YY, where:

μ=(μXμY)=(510),Σ=(4229)\mathbf{\mu} = \begin{pmatrix} \mu_X \\ \mu_Y \end{pmatrix} = \begin{pmatrix} 5 \\ 10 \end{pmatrix}, \quad \mathbf{\Sigma} = \begin{pmatrix} 4 & 2 \\ 2 & 9 \end{pmatrix}

The covariance matrix Σ\mathbf{\Sigma} indicates that the variance of XX is 4, the variance of YY is 9, and the covariance between XX and YY is 2. The joint distribution of XX and YY is fully characterized by these parameters.

Properties

  1. Marginal Distributions: Any subset of variables from a multivariate normal distribution is also normally distributed. For example, XX and YY individually follow normal distributions with their respective means and variances.

  2. Linear Combinations: Any linear combination of the variables in a multivariate normal distribution is also normally distributed. For instance, if Z=aX+bYZ = aX + bY, then ZZ is normally distributed.

  3. Conditional Distributions: The conditional distribution of a subset of variables given the others is also normally distributed. If YY is known, the distribution of XX given Y=yY=y is normal.

Applications

  • Principal Component Analysis (PCA): PCA is often more effective when the data exhibits multivariate normality, as this assumption simplifies the data structure and allows for dimensionality reduction while retaining as much variance as possible. However, PCA can be applied to data with other distributions as well.

  • Portfolio Theory: In finance, the returns on a portfolio of assets are often modeled using a multivariate normal distribution, which allows for the computation of portfolio risk and return based on the covariances between asset returns.

  • Gaussian Mixture Models (GMM): GMMs use multiple multivariate normal distributions to model complex data distributions, often applied in clustering and classification tasks.

Conclusion

Multivariate distributions are fundamental in understanding how multiple variables interact and behave together. This in-depth exploration of joint, marginal, and conditional distributions, along with covariance, correlation, and the multivariate normal distribution, equips data scientists with the knowledge needed to model and analyze complex datasets effectively. Through practical examples, we have seen how these concepts are applied, providing a strong foundation for further exploration in multivariate analysis.