Probability Chapter 4: Variance and Standard Deviation
4.1 Introduction
We have previously stated that expectation is the center of the distribution, but histograms with different distributions could also have the same expectation! This is where the “spread” or variability of the distribution comes into play. Once we understand this variability we can see what it tells us about the tails of the distribution, which in a sense helps us see the probabilities of values that are far away from the expectation.
4.2 Defining Variance and Standard Deviation
The variance of a random variable \(X\), typically denoted as \(\text{Var}(X)\), measures the average squared distance from the mean. The general equation is:
\[ \text{Var}(X) = E(X^2) - (E(X))^2 \]
The variance tells us how much the values of \(X\) deviate from their mean.
The standard deviation of \(X\), typically denoted as \(\text{SD}(X)\), is the square root of the variance:
\[ \text{SD}(X) = \sqrt{\text{Var}(X)} \]
While variance is mathematically convenient, it’s in squared units, making it harder to interpret. The standard deviation brings it back to the same units as the original data, making it easier to understand the typical amount of variation.
A small standard deviation means that values of \(X\) are tightly clustered around the mean. A large standard deviation means that they are more spread out and the tails of the distribution are heavier.
4.3 Example Calculation
Let \(X\) have the following distribution:
- \(P(X = 3) = 0.35\)
- \(P(X = 4) = 0.5\)
- \(P(X = 5) = 0.15\)
Then:
\[ E(X) = 3(0.35) + 4(0.5) + 5(0.15) = 3.8 \]
\[ E(X^2) = 3^2(0.35) + 4^2(0.5) + 5^2(0.15) = 14.9 \]
\[ \text{Var}(X) = 14.9 - 3.8^2 = 14.9 - 14.44 = 0.46 \]
\[ \text{SD}(X) = \sqrt{0.46} = 0.6782 \]
4.4 Dependence and Variance
From the previous chapter we mentioned the additivity of Expectation:
\[ E(X + Y) = E(X) + E(Y) \]
This is true regardless of dependence. Variance does not behave the same way.
Suppose we roll a die twice. Define:
- \(V = D_1 + D_1\)
- \(W = D_1 + D_2\)
Then \(E(V) = 7 = E(W)\), but the distributions are very different. \(V\) takes values 2, 4, 6, ..., 12 while \(W\) takes values 2 through 12. \(V\) has a larger spread due to dependence.
If \(X\) and \(Y\) are independent:
\[ \text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) \]
4.5 Binomial Variance Using Indicators
We know that if \(X \sim \text{Binomial}(n, p)\), then \(E(X) = np\).
We represent \(X\) as the sum of \(n\) independent indicator variables:
\[ X = I_1 + I_2 + \dots + I_n \]
Each \(I_j\):
- \(E(I_j) = p\)
- \(E(I_j^2) = p\)
- \(\text{Var}(I_j) = p - p^2 = p(1 - p)\)
Because the indicators are independent:
\[ \text{Var}(X) = np(1 - p) \]
\[ \text{SD}(X) = \sqrt{np(1 - p)} \]
Example: Number of heads in 100 coin tosses → Binomial(100, 0.5)
\[ E(X) = 50, \quad \text{SD}(X) = \sqrt{100 \cdot 0.5 \cdot 0.5} = 5 \]
4.6 Hypergeometric Variance Using Indicators
Let \(X \sim \text{Hypergeometric}(N, G, n)\). We write \(X = I_1 + I_2 + \dots + I_n\), where each \(I_j\) indicates whether the \(j^{th}\) draw is a good element.
From symmetry and indicator methods:
- \(E(I_j) = G/N\)
- \(\text{Var}(I_j) = \frac{G}{N} \left(1 - \frac{G}{N}\right) \cdot \frac{N - n}{N - 1}\)
Using additivity:
\[ \text{Var}(X) = n \cdot \frac{G}{N} \left(1 - \frac{G}{N}\right) \cdot \frac{N - n}{N - 1} \]
4.7 Variance of Linear Transformations
Another important rule is the linear function rule. Let \(Y = aX + b\), then:
- \(\text{Var}(Y) = a^2 \cdot \text{Var}(X)\)
- \(\text{SD}(Y) = |a| \cdot \text{SD}(X)\)
We won’t go through the proofs in this section, but this is an important rule to know when calculating variance and standard deviation of a linear function.