Chapter 4: Multiple random variables#
“Do the difficult things while they are easy and do the great things while they are small. A journey of a thousand miles must begin with a single step.”
—Lao Tzu
Discrete cases#
Considering two discrete random variables \(X\) and \(Y\) with a joint probability mass function (pmf) \(P(X=x, Y=y)\) or simply \(P(x,y)\).
Example 30
The joint pmf can be represented by a table. The following table indicates \(P(X=1,Y=0)=0.1\), \(P(X=1,Y=1)=0.3\), \(P(X=2,Y=0)=0.2\), and \(P(X=2,Y=1)=0.4\). The total probability is \(0.1+0.2+0.3+0.4=1\).
X=1 |
X=2 |
|
---|---|---|
Y=0 |
0.1 |
0.2 |
Y=1 |
0.3 |
0.4 |
Definition 14 (marginal pmf)
The marginal pmf of \(X\) is just the pmf of \(X\), which can be obtained from the joint pmf by summing over \(Y\), i.e.,
Similarly, the marginal pmf of \(Y\) can be obtained from the joint pmf by summing over \(X\), i.e.,
Moreover, we can find the conditional probability by the following formula
Example 31
X=1 |
X=2 |
sum |
|
---|---|---|---|
Y=0 |
0.1 |
0.2 |
0.3 |
Y=1 |
0.3 |
0.4 |
0.7 |
sum |
0.4 |
0.6 |
1 |
The marginal probability distribution of \(X\) is given by
X |
1 |
2 |
---|---|---|
P |
0.4 |
0.6 |
The marginal probability distribution of \(Y\) is given by
Y |
0 |
1 |
---|---|---|
P |
0.3 |
0.7 |
The conditional probability distribution of \(X\) given \(Y=0\) is \(P(X=1|Y=0) = 0.1/0.3 = 1/3\) and \(P(X=1|Y=0) = 0.2/0.3 = 2/3\)
The conditional probability distribution of \(X\) given \(Y=1\) is \(P(X=1|Y=1) = 0.3/0.7 = 3/7\) and \(P(X=1|Y=1) = 0.4/0.7 = 4/7\)
In addition, \(X\) and \(Y\) are not independent because \(P(X=1, Y=0) \ne P(X=1)P(Y=0)\)
Definition
The expectation of a function \(g(x,y)\) of \(X\) and \(Y\) is defined as
The conditional expectation of \(X\) given \(Y\) is given by
Example 32
The joint pmf of two random variables \(X\) and \(Y\) is given in the following table
The marginal pmf of \(X\) is \(P(X=1)=0.5\) and \(P(X=2)=0.5\).
The marginal pmf of \(Y\) is \(P(Y=0)=0.3, P(Y=1)=0.2, P(Y=2)=0.2\), \(P(Y=3)=0.3\).
The expectation of \(X\) is \(E(X)=1 * 0.5+2 * 0.5=1.5\)
The expectation of \(Y\) is \(E(Y)=0 * 0.3+1 * 0.2+2 * 0.2+3 * 0.3=1.5\)
The conditional distribution of \(X\) given \(Y=0\) is \(P(X=1 \mid Y=0)=\frac{1}{3}\) and \(P(X=2 \mid Y=0)=\frac{2}{3}\)
The conditional expectation of \(X\) given \(Y=0\) is \(E(X \mid Y=0)=1 * \frac{1}{3}+2 * \frac{2}{3}=\frac{5}{3}\).
Continuous cases#
Joint, marginal, conditional densities#
Let \(f(x, y)\) be the joint density function of two continuous random variables \(X\) and \(Y\). The marginal density of \(X\) can be obtained from the joint density by integrating over \(Y\), i.e.,
Similarly, the marginal density of \(Y\) can be obtained from the joint density by integrating over \(X\), i.e.,
The condition density of \(X\) given \(Y=y\) is given by
Joint probabilities and expectations#
We use the double integral to calculate the joint probability \(P(x<a, y<b)\).
The expection of a function \(g(x,y)\) of random variables \(X\) and \(Y\) is given by
The conditional expectation of \(X\) given \(Y=y\) is
Important
Note that \(E(X \mid Y)\) is a function of \(Y\). Thus, \(E(X \mid Y)\) is a random variable. But \(E(X)\) is a constant. In addition,
This is because
Similarly, we can show
Covariance and correlation#
The covariance of \(X_{1}\) and \(X_{2}\) is defined as
When \(X_{1}=X_{2}\), the covariance of \(X_{1}\) and \(X_{2}\) is equal to the variance, i.e.,
Important
Variance is a special case of covariance
Covariance could be any real number, but variance must be non-negative.
The correlation of two random variables \(X_{1}\) and \(X_{2}\) is given by
Important
Correlation is the standardized covariance
\(-1 \leq\) correlation \(\leq 1\)
Independent random variables#
If two random variables \(X_{1}\) and \(X_{2}\) are independent of each other, their joint density function is the product of two marginal densities \(f(X_1)\) and \(f(X_2)\),
The joint density of \(n\) independent random variables \((X_1,\dots,X_n)\) is the product of \(n\) marginal densities,
Example 33
The notation \(\left(X_{1}, X_{2}, \ldots, X_{n}\right) \sim \operatorname{Normal}\left(\mu, \sigma^{2}\right)\) indicates that \(X_{1}, X_{2}, \ldots, X_{n}\) are independently and identically distributed random variables with the same probability distribution \(\operatorname{Normal}\left(\mu, \sigma^{2}\right)\). The joint density function of \(X_{1}, X_{2}, \ldots, X_{n}\) is given by
Important
The joint density function of data \(\left(X_1,\dots,X_n\right)\) is also called the likelihood function. The likelihood function measures the likelihood of the observed data \(\left(X_1,\dots,X_n\right)\) given the parameters. When we estimate the parameter values from data, we choose the parameter values such that they can maximize the likelihood of the observed data; this is called “maximum likelihood estimate”
Example 34
\(X_{1}, X_{2}, \ldots, X_{n} \sim \operatorname{Poisson}(\lambda)\). The joint density function is
Example 35
\(X_{1}, X_{2}, \ldots, X_{n} \sim \operatorname{Exponential}(\lambda)\). The joint density function is
The sum of independent random variables#
The sum of independent random variables is an important statistic in statistical inference. To understand its statistical properties, it is of great interest to find its probability distribution. Here, we demonstrate an elegant way of deriving the probability distribution of the sum of independent random variables.
Discrete cases#
Recall that if a discrete random variable \(X\) take values of nonnegative integers, the probability generating function (PGF) of \(X\) is defined as
Tip
We can find probabilities by differentiating PGF.
Important
The Probability Generating Function (PGF) of a random variable uniquely determines its probability distribution. Once the PGF is determined, the corresponding probability distribution can be identified.
We can show that
The probability generating function for the binomial distribution is \((1-p+pt)^{n}\)
The probability generating function for the Poisson distribution is \(e^{\lambda(t-1)}\)
Theorem 11
Let \(X_{1}, X_{2}, \ldots, X_{n}\) be independent discrete random variables with the probability generating functions \(G_{1}(t), \ldots, G_{n}(t)\). Then, the probability generating function of \(S_{n}=\sum_{i=1}^{n} X_{i}\) is equal to the product of individual PGFs, i.e.,
Example 36
Suppose \(X_{1} \sim Poisson(\lambda_1)\) and \(X_2 \sim Poisson(\lambda_2)\). Find the probability distribution of \(X_1+X_2\).
The PGF of \(X_1\) is \(G_{X_1}(t) = e^{\lambda_1(t-1)}\) and the PGF of \(X_2\) is \(G_{X_2}(t) = e^{\lambda_2(t-1)}\). Thus,
This is the PGF of the Poisson distribution with \(\lambda=\lambda_1+\lambda_2\). Thus, the sum \(X_1+X_2\) is a Poisson random variable with mean \(\lambda=\lambda_1+\lambda_2\).
If \(X_{1},\ldots, X_{n}\) are identically and independently distributed (iid) with the same probability distribution, then the PGF of their sum is given by
Example 37
\(X_{1}, X_{2}, \ldots, X_{n} \sim \operatorname{Poisson}(\lambda)\) and the PGF of the Poisson distribution is \(G(t)=e^{\lambda(t-1)}\). Then, the PGF of the sum \(S_n=\sum_{i=1}^nX_i\) is given by
This is the PGF of the Poisson distribution with \(\lambda^*=n \lambda\). Thus, \(S_{n}\) of Poisson random variables is again a Poisson random variable with mean \(\lambda^*=n \lambda\).
Example 38
\(X_{1}, X_{2}, \ldots, X_{n} \sim \operatorname{Bernoulli}(p) and the PGF of the Bernoulli distribution is G(t)=(1-p)+p t\). Then, the PGF of the sum \(S_n=\sum_{i=1}^nX_i\) is given by
This is the PGF of the Binomial distribution.Thus, the sum \(S_{n}\) of Bernoulli random variables is a Binomial random variable.
Continuous cases#
The PGF cannot be applied to the continuous random variables, because the continuous random variables do not have point probabilities. Instead, we will use the moment generating function to derive the probability distribution of the sum.
Let \(X_{1}, X_{2}, \ldots, X_{n}\) be independent random variables with MGFs \(M_{1}(t), \ldots, M_{n}(t)\). Then, the MGF of the sum \(S_{n}=\sum_{i=1}^{n} X_{i}\) is equal to the product of individual MGFs, i.e.,
In addition, if \(X_{1}, X_{2}, \ldots, X_{n}\) have the same probability distribution, then the MGF of the sum is given by
Tip
We can show that
The moment generating function for the normal distribution is \(e^{u t+\sigma^{2} t^{2} / 2}\)
The moment generating function for the exponential distribution is \((1-t / \lambda)^{-1}\)
The moment generating function for the gamma distribution is \((1-t / \alpha)^{-\beta}\)
Example 39
Given a random sample \(X_{1}, \dots, X_{n} \sim \operatorname{Normal}\left(\mu, \sigma^{2}\right)\), find the probability distribution of the sum \(S_{n}=\sum_{i=1}^{n} X_{i}\).
Since \(X_i's\) have the same normal distribution with the MGF \(M(t)=e^{u t+\sigma^{2} t^{2} / 2}\), the MGF of the sum \(S_{n}\) is
This is the MGF of the normal distribution with mean \(n \mu\) and variance \(n \sigma^{2}\). Thus, the sum \(S_{n}\) of iid (independently and identically distributed) normal random variables is again a normal random variable with mean \(n \mu\) and variance \(n \sigma^{2}\).
Example 40
Given a random sample \(X_{1}, \dots, X_{n} \sim \exp (\lambda)\), find the probability distribution of the sum \(S_{n}=\sum_{i=1}^{n} X_{i}\).
Since \(X_i's\) have the same exponential distribution with the MGF \(M(t)=(1-t / \lambda)^{-1}\), the MGF of the sum \(S_{n}\) is
This is the MGF of the Gamma distribution. Thus, the sum \(S_{n}\) of iid exponential random variables is a Gamma random variable with \(\alpha=\lambda\) and \(\beta=n\).
Alternatively, the subsequent example illustrates how the Cumulative Distribution Function (CDF) technique can be employed to ascertain the probability distribution of a statistic.
Example 41
Suppose \(X_{1}, \ldots, X_{n}\) are iid random variables with the same density \(f(x)\) and CDF \(F(x)\). Find the distribution of \(\max \left\{X_{1}, X_{2}, \ldots, X_{n}\right\}\).
\(P\left(X_{\max } \leq a\right)=P\left(X_{1} \leq a, X_{2} \leq a, \ldots, X_{n} \leq a\right)=\prod P\left(X_{i} \leq a\right)=(F(a))^{n}\). Thus, the probability density function is \(f\left(X_{\max }=a\right)=n(F(a))^{n-1} f(a)\).
Mutivariate probability distributions#
Multinomial distribution#
The multinomial distribution is an extension of the binomial distribution. In the Binomial distribution, there are two possible outcomes. The multinomial distribution is dealing with multiple \((>2)\) outcomes.
Example 42
Suppose the proportions of \(A, C, G, T\) in the genome are \(p_{A}=0.2, p_{c}=0.3, p_{G}=0.2, p_{T}=0.3\). We select \(n=100\) nucleotides at random from the genome. Let \(X_{A}, X_{C}, X_{G}, X_{T}\) be the number of \(A, C, G, T\), respectively. \(X_{A}, X_{C}, X_{G}, X_{T}\) are random variables and the sum of \(X_{A}, X_{C}, X_{G}\), \(X_{T}\) is \(n\). \(\left\{X_{A}, X_{C}, X_{G}, X_{T}\right\}\) follow the multinomial distribution with joint probability mass function
where \(x_{A}+x_{C}+x_{G}+x_{T}=n\) and \(p_{A}+p_{C}+p_{G}+p_{T}=1\)
It can be shown that the marginal distribution of \(X_{A}\) is the \(\operatorname{Binomial}\left(n, P_{A}\right)\). Thus, \(E\left(X_{A}\right)=\) \(nP_{A} = 100*0.2=20\).
Multivariate normal distribution#
The multivariate normal density function is given by
where \(\Sigma\) is the covariance matrix and \(\mu\) is the mean vector.
The marginal distribution of \(X_{i}\) is normal \(\left(\mu_{i}, \sigma_{i}^{2}\right)\).
The conditional distribution of \(X_{i}\) given \(X_{j}\) for \(j \neq i\) is normal
Any linear combination of \(X_1,\dots, X_n\), i.e., \(\sum_{i=1}^na_iX_i\) follows the normal distribution with mean \(\sum_{i=1}^na_i\mu_i\) and variance \(a^{t}\Sigma a\) where \(a=(a_1,\dots,a_n)\) and \(a^t\) is the transpose of the vector \(a\).
Example 43
Suppose three random variables \((X_1, X_2, X_3)\) follow a Multivariate normal distribution with the mean vector \(\mu=(1.2, 3.4, 0.4)\) and the covariance matrix \(\begin{pmatrix} 0.1& 0.2& -0.4\\ 0.2 & 1.3 & 1.8\\ -0.4 & 1.8 & 1.1\end{pmatrix}\).
a) What is the probability density function of \(X_1\)?
\(E(X_1) = 1.2\) and \(var(X_1)=0.1\). Thus, \(X_1\) follows \(Normal(1.2, 0.1)\).
b) Find \(E(X_2)\) and \(var(X_3)\)
\(E(X_2) = 3.4\) and \(var(X_3) = 1.1\)
c) Find \(E(2X_1-X_2-4X_3)\) and \(var(X_1+X_2+X_3)\)
d) Let \(Y = X_1+X_2+X_3\). Find the probability density function of \(Y\)
\(E(X_1+X_2+X_3)=1.2+3.4+0.4=5\) and \(var(X_1+X_2+X_3) = 3.7\). Thus, \(Y\) follows \(Normal(5, 3.7)\).