A very important pillar of statistics and data science is the understanding of expectation: its uses, its key takeaways, and the problems that it helps you solve as a data scientist. Previously we have discussed the distribution of a random variable, but a very important fact we would often like to know is where \(X\) would “land” on our distribution. This is what we statisticians call the expectation. It is a very useful tool that can let you know, for example: what is the expected number of heads when you flip a fair coin 5 times? This might seem like a simple problem, but by the end of this chapter we will be able to understand what expectation means fundamentally and all of the powerful uses that it has.
We typically denote the expectation of a random variable as \(E(X)\), which can also be thought of as the long-run average value of \(X\) weighted by their probabilities. We will be going through multiple ways of calculating the Expectation depending on what distribution we are working with. The most common and most used way to calculate Expectation is:
\[ E(X) = \sum x \cdot P(X = x) \]
We will then insert a table that has possible values and their corresponding probabilities such as \(x = 1, 2, 3, 4, 5\) with probabilities \(0.1, 0.15, 0.25, 0.3, 0.2\) and we calculate the expectation as:
\[ E(X) = 1(0.1) + 2(0.15) + 3(0.25) + 4(0.3) + 5(0.2) = 3.35 \]
Then we will create a probability histogram and display it while also putting a vertical line where the expectation is.
Say that we have a function \(Y\) given by \(Y = X^2\). To find the expectation of this function, we compute \(E(g(X))\):
\[ E(g(X)) = \sum g(x) \cdot P(X = x) \]
So in our example above, if \(Y = X^2\), then:
\[ E(X^2) = 1^2(0.1) + 2^2(0.15) + 3^2(0.25) + 4^2(0.3) + 5^2(0.2) = 12.75 \]
The additivity of expectation means:
\[ E(X + Y) = E(X) + E(Y) \]
This is also known as the linearity of expectation.
An indicator of an event is a random variable that is 1 if the event occurs and 0 if it does not. The expectation of an indicator is:
\[ E(I_A) = 1 \cdot P(A) + 0 \cdot (1 - P(A)) = P(A) \]
Suppose \(X \sim \text{Binomial}(n, p)\). Then we can write \(X\) as the sum of \(n\) i.i.d. Bernoulli trials:
\[ X = I_1 + I_2 + \dots + I_n \]
Using linearity:
\[ E(X) = E(I_1) + E(I_2) + \dots + E(I_n) = np \]
Examples:
Suppose \(X \sim \text{Hypergeometric}(N, G, n)\). Then:
\[ E(X) = n \cdot \frac{G}{N} \]
Examples:
The method of indicators is intended for finding expectations of random counts with a clear upper limit — in these cases, \(n\).
Let \(X\) and \(Y\) be two random variables on the same outcome space. For a fixed \(x\), the conditional expectation is:
\[ E(Y \mid X = x) = \sum y \cdot P(Y = y \mid X = x) \]
Conditional expectation adjusts our expectation for \(Y\) based on knowledge of \(X\).
Let \(Y\) = study hours, \(X\) = number of coffee cups. From data:
When \(X = 0\):
\(E(Y \mid X = 0) = 0(0.2) + 1(0.5) + 2(0.3) = 1.1\)
When \(X = 1\):
\(E(Y \mid X = 1) = 1(0.3) + 2(0.5) + 3(0.2) = 1.9\)
Applications of conditional expectation include:
You roll two fair six-sided dice. Let \( M = \max(X, Y) \), where \( X \) and \( Y \) are the values on each die. What is \( \mathbb{E}[M] \)?
Use the identity:
\[ \mathbb{E}[M] = \sum_{k=1}^{6} P(M \geq k) \]
\( P(M \geq k) = 1 - P(X < k)^2 = 1 - \left(\frac{k-1}{6}\right)^2 \)
Now sum from \( k = 1 \) to \( 6 \):
\[ \mathbb{E}[M] = \sum_{k=1}^{6} \left(1 - \left(\frac{k-1}{6}\right)^2\right) = \frac{161}{36} \approx 4.472 \]
You flip a fair coin 10 times. Let \( X \) be the number of heads. What is \( \mathbb{E}[X] \)?
Each flip has a 0.5 chance of being heads.
Let \( I_i \) be 1 if flip \( i \) is heads. Then \( \mathbb{E}[I_i] = 0.5 \)
\[ \mathbb{E}[X] = \sum_{i=1}^{10} \mathbb{E}[I_i] = 10 \cdot 0.5 = 5 \]
A bag contains 3 red balls and 2 blue balls. You draw one ball at random. If it’s red, you draw another. Let \( X \) be the total number of draws. What is \( \mathbb{E}[X] \)?
With probability \( \frac{3}{5} \), the first ball is red and you draw again (total 2 draws).
With probability \( \frac{2}{5} \), the first ball is blue and you stop (1 draw).
\[ \mathbb{E}[X] = \frac{3}{5} \cdot 2 + \frac{2}{5} \cdot 1 = \frac{6}{5} + \frac{2}{5} = \frac{8}{5} = 1.6 \]
You throw 3 balls into 3 bins uniformly at random. What is the expected number of empty bins?
Let \( I_i \) be 1 if bin \( i \) is empty. The probability bin \( i \) is empty is \( \left(\frac{2}{3}\right)^3 = \frac{8}{27} \).
There are 3 bins, so:
\[ \mathbb{E}[\text{empty bins}] = 3 \cdot \frac{8}{27} = \frac{24}{27} = \frac{8}{9} \]