Chapter 3: Expectation
Probability
Home Overview Chapter 1: Building Blocks of Probability Chapter 2: Random Variables Chapter 3: Expectation Chapter 4: Variance and Standard Deviation Chapter 5: Central Limit Theorem Chapter 6: Density Chapter 7: Joint Distribution and Covariance Chapter 8: Regression

Probability Chapter 3: Expectation

3.1 Introduction

A very important pillar of statistics and data science is the understanding of expectation: its uses, its key takeaways, and the problems that it helps you solve as a data scientist. Previously we have discussed the distribution of a random variable, but a very important fact we would often like to know is where \(X\) would “land” on our distribution. This is what we statisticians call the expectation. It is a very useful tool that can let you know, for example: what is the expected number of heads when you flip a fair coin 5 times? This might seem like a simple problem, but by the end of this chapter we will be able to understand what expectation means fundamentally and all of the powerful uses that it has.

3.2 Definition and Calculation

We typically denote the expectation of a random variable as \(E(X)\), which can also be thought of as the long-run average value of \(X\) weighted by their probabilities. We will be going through multiple ways of calculating the Expectation depending on what distribution we are working with. The most common and most used way to calculate Expectation is:

\[ E(X) = \sum x \cdot P(X = x) \]

We will then insert a table that has possible values and their corresponding probabilities such as \(x = 1, 2, 3, 4, 5\) with probabilities \(0.1, 0.15, 0.25, 0.3, 0.2\) and we calculate the expectation as:

\[ E(X) = 1(0.1) + 2(0.15) + 3(0.25) + 4(0.3) + 5(0.2) = 3.35 \]

Then we will create a probability histogram and display it while also putting a vertical line where the expectation is.

3.3 Expectation for Common Distributions

  • Expectation of a constant will always be the constant itself: \(E(C) = C\).
  • Bernoulli distribution: \(E(X) = 0(1 - p) + 1(p) = p\).
  • Uniform distribution (such as a die): \(E(X) = \frac{n + 1}{2}\), which gives us 3.5 for \(n = 6\).
  • Poisson distribution: \(E(X) = \mu\). We will have the algebra proof and breakdown below in LaTeX.

3.4 Expectation of a Function

Say that we have a function \(Y\) given by \(Y = X^2\). To find the expectation of this function, we compute \(E(g(X))\):

\[ E(g(X)) = \sum g(x) \cdot P(X = x) \]

So in our example above, if \(Y = X^2\), then:

\[ E(X^2) = 1^2(0.1) + 2^2(0.15) + 3^2(0.25) + 4^2(0.3) + 5^2(0.2) = 12.75 \]

3.5 Linearity of Expectation

The additivity of expectation means:

\[ E(X + Y) = E(X) + E(Y) \]

This is also known as the linearity of expectation.

3.6 Method of Indicators

An indicator of an event is a random variable that is 1 if the event occurs and 0 if it does not. The expectation of an indicator is:

\[ E(I_A) = 1 \cdot P(A) + 0 \cdot (1 - P(A)) = P(A) \]

3.6.1 Binomial Case

Suppose \(X \sim \text{Binomial}(n, p)\). Then we can write \(X\) as the sum of \(n\) i.i.d. Bernoulli trials:

\[ X = I_1 + I_2 + \dots + I_n \]

Using linearity:

\[ E(X) = E(I_1) + E(I_2) + \dots + E(I_n) = np \]

Examples:

  • 100 coin tosses: \(E(X) = 100 \cdot \frac{1}{2} = 50\)
  • 30 die rolls: \(E(X) = 30 \cdot \frac{1}{6} = 5\)

3.6.2 Hypergeometric Case

Suppose \(X \sim \text{Hypergeometric}(N, G, n)\). Then:

\[ E(X) = n \cdot \frac{G}{N} \]

Examples:

  • 5-card poker hand, 4 queens: \(E(X) = 5 \cdot \frac{4}{52} = 0.3846\)
  • 65-person sample from 30 Californians out of 100: \(E(X) = 65 \cdot \frac{30}{100} = 19.5\)

The method of indicators is intended for finding expectations of random counts with a clear upper limit — in these cases, \(n\).

3.7 Conditional Expectation

Let \(X\) and \(Y\) be two random variables on the same outcome space. For a fixed \(x\), the conditional expectation is:

\[ E(Y \mid X = x) = \sum y \cdot P(Y = y \mid X = x) \]

Conditional expectation adjusts our expectation for \(Y\) based on knowledge of \(X\).

Example

Let \(Y\) = study hours, \(X\) = number of coffee cups. From data:

When \(X = 0\):

  • \(P(Y = 0 \mid X = 0) = 0.2\)
  • \(P(Y = 1 \mid X = 0) = 0.5\)
  • \(P(Y = 2 \mid X = 0) = 0.3\)

\(E(Y \mid X = 0) = 0(0.2) + 1(0.5) + 2(0.3) = 1.1\)

When \(X = 1\):

  • \(P(Y = 1 \mid X = 1) = 0.3\)
  • \(P(Y = 2 \mid X = 1) = 0.5\)
  • \(P(Y = 3 \mid X = 1) = 0.2\)

\(E(Y \mid X = 1) = 1(0.3) + 2(0.5) + 3(0.2) = 1.9\)

Applications of conditional expectation include:

  • Regression models
  • Bayesian inference
  • Missing data imputation
  • Decision theory

Problem 1: Expected Maximum of Two Dice

You roll two fair six-sided dice. Let \( M = \max(X, Y) \), where \( X \) and \( Y \) are the values on each die. What is \( \mathbb{E}[M] \)?

▶ Click to show solution

Use the identity:

\[ \mathbb{E}[M] = \sum_{k=1}^{6} P(M \geq k) \]

\( P(M \geq k) = 1 - P(X < k)^2 = 1 - \left(\frac{k-1}{6}\right)^2 \)

Now sum from \( k = 1 \) to \( 6 \):

\[ \mathbb{E}[M] = \sum_{k=1}^{6} \left(1 - \left(\frac{k-1}{6}\right)^2\right) = \frac{161}{36} \approx 4.472 \]

Problem 2: Expected Number of Heads in Coin Tosses

You flip a fair coin 10 times. Let \( X \) be the number of heads. What is \( \mathbb{E}[X] \)?

▶ Click to show solution

Each flip has a 0.5 chance of being heads.

Let \( I_i \) be 1 if flip \( i \) is heads. Then \( \mathbb{E}[I_i] = 0.5 \)

\[ \mathbb{E}[X] = \sum_{i=1}^{10} \mathbb{E}[I_i] = 10 \cdot 0.5 = 5 \]

Problem 3: Expected Value with Simple Conditioning

A bag contains 3 red balls and 2 blue balls. You draw one ball at random. If it’s red, you draw another. Let \( X \) be the total number of draws. What is \( \mathbb{E}[X] \)?

▶ Click to show solution

With probability \( \frac{3}{5} \), the first ball is red and you draw again (total 2 draws).

With probability \( \frac{2}{5} \), the first ball is blue and you stop (1 draw).

\[ \mathbb{E}[X] = \frac{3}{5} \cdot 2 + \frac{2}{5} \cdot 1 = \frac{6}{5} + \frac{2}{5} = \frac{8}{5} = 1.6 \]

Problem 4: Expected Number of Empty Bins (Moderate)

You throw 3 balls into 3 bins uniformly at random. What is the expected number of empty bins?

▶ Click to show solution

Let \( I_i \) be 1 if bin \( i \) is empty. The probability bin \( i \) is empty is \( \left(\frac{2}{3}\right)^3 = \frac{8}{27} \).

There are 3 bins, so:

\[ \mathbb{E}[\text{empty bins}] = 3 \cdot \frac{8}{27} = \frac{24}{27} = \frac{8}{9} \]