For easier and less intimidating viewing, better use a bigger display. Unless you're watching a video or recalling something

03. Random Variables; $\mathbb{P}$ & $\mathbb{E}$

2.1 The basics

Time to get more formal.

Definition [Random Variable]

Suppose we have a probability space $(\Omega, \mathbb{P})$, where $\Omega = \{ \omega_1, \omega_2, .... \}$ is a set of all simple events and $\mathbb{P}$ is a function somehow defined on $\Omega$, say $\mathbb{P}(\omega_i) = p_i \geq 0$ where $p_1 + p_2 + p_3 + ... = 1$. This is just a more formal way of saying that there is a bunch of simple events (finite or infinite number of them) with some probabilities assigned to them.
And here comes a very important definition: a random variable is a function from $\Omega$ to $\mathbb{R}$. We will usually use capital letters $X, Y, ..$ or small greek letters $\delta, \xi, ...$ to denote random variables.

Note that technically there is nothing random about random variables, even though the names suggests it! Any random variable is merely a function $\Omega \to \mathbb{R}$. It is just it has a meaning in real world that is related to randomness. For example, the usual words of

"rolling a fair die and seeing number 1, 2, ..., or 6"

can be interpreted (using our fancy notation and definitions) as

"Given probability space $\Omega = \{ \text{roll 1}, \text {roll 2}, ..., \text{roll 6} \}$ with $\mathbb{P}(\text{roll }k) = \frac{1}{6}$, we can have a random variable $X: \Omega \to \{1, 2, ..., 6 \} \subset \mathbb{R}$ such that $X(\text{roll }k) = k$".

This might look scary at first. But it's just a change to a more mathematical way of thinking. We need this change to use some math tools and ideas that will help us understand things better. Without this math approach, we would only have "our feelings about what might happen" instead of informed predictions. We would also miss out on understanding things like statistics, leading to basic data analysis and wrong conclusions.

Exercise 1.

How would you interpret (in a similar manner as above) fair coin toss? Please pay attention to the fact that random variables take numerical values, i.e not something like "heads" or "tails".

Explanation and comments

click on blur to reveal

We will take probability space as following: $\Omega = \{ \text{Heads}, \text{Tails} \}$ with $\mathbb{P}(\text{Heads}) = \mathbb{P}(\text{Tails}) = \frac{1}{2}$ we can have a random variable $X: \Omega \to \{0, 1 \} \subset \mathbb{R}$ such that $X(\text{Tails}) = 0$ and $X(\text{Heads})=1$.

2.2 Probability distributions

Continuing with the formal definitions:

Definition [Probability Distribution & Common Examples]

Let's go back to a set up with $(\Omega, \mathbb{P})$, where $\Omega = \{ \omega_1, \omega_2, .... \}$ is a set of all simple events and $\mathbb{P}$ is our function somehow defined on $\Omega$. As before, let's say $\mathbb{P}(\omega_i) = p_i \geq 0$ where $p_1 + p_2 + p_3 + ... = 1$. This set of $\{ p_1, p_2, ... \}$ is what is called probability distribution. There a few popular probability distributions. There is no need in remembering their names for now, you will get used to them over time:

Bernoulli distribution: $\{ p, 1-p \}$. This distribution models a coin toss, where the probability of getting heads is $1 \geq p \geq 0$ (this $p$ is a parameter of this distribution);
Uniform distribution: $\{ \frac{1}{n}, \frac{1}{n}, ..., \frac{1}{n} \}$ where there are $n$ simple events. This distribution is clear and is the model when you have $n$ symmetrical (in some sense) events;
Binomial distribution: $\{ p_0, ..., p_n \}$ where $p_k = \binom{n}{k}p^k(1-p)^{n-k}$ for some $1 \geq p \geq 0$. It is an exercise (see below) to understand where this weird formula comes from ($n$ and $p$ are parameters of this distribution);
Geometric distribution: $\{p, (1-p) \cdot p, (1-p)^2 \cdot p,$ $(1-p)^3 \cdot p, ... \}$. Yeah, it assumes infinite amount of simple events. It models the number of tosses required to obtain the first occurrence of heads when tossing a coin with probability $p$ of showing heads ($p$ is a parameter of this distribution).

Exercise 2.

Explain what does Binomial distribution model, where does that weird formula come from. (This is one of the exercises where being comfortable with combinatorics is important)

Explanation and comments

click on blur to reveal

This distribution models the number of heads when tossing a coin $n$ times such that the probability of getting heads is $p$. Or, if you like it more, it models the number of heads when $n$ coins are tossed (each of the coins is again such that the probability of getting heads is $p$). Indeed, the number of ways of get exactly $k$ heads when tossing a coin $n$ times is $\binom{n}{k}$ since we simply need to pick which $k$ of the $n$ tosses should be heads (others will be tails). Once we picked the $k$ tosses for heads, we have $p^k$ as probability that these $k$ tosses are indeed all heads and $(1-p)^{n-k}$ that the rest $n-k$ tosses are indeed all tails. Thus we obtain $p_k = \binom{n}{k}p^k(1-p)^{n-k}$ as the probability that we get exactly $k$ heads.

Exercise 3.

There is a formula many people get to see in high school, and it is often treated as a definition or, as the very least, it's nature is not questioned. Here is the formula: \[ \mathbb{P}(\text{something}) = \frac{\text{number of favourable events}}{\text{total number of events}} \] Which distribution does it correspond to?

Explanation and comments

click on blur to reveal

Let's reiterate this one more time, this formula is not a definition of probability or something like that. If anything, it is a lemma coming from the definition of one of the most popular probability spaces, namely the uniform one. This formula says that if an event $A$ consists of $m$ simple events, each having probability $\frac{1}{n}$ (where $n$ is the total number of simple events), then $\mathbb{P}(A) = \frac{1}{n} + ... + \frac{1}{n} = \frac{m}{n}$. The first "=" in the this chain of equalities is the definition of $\mathbb{P}(A)$ and the second is well... obvious :)

Exercise 4.

For each of the distributions above explain why they are indeed probability distributions (recall, that the probabilities assigned must add up to 1).

Explanation and comments

click on blur to reveal

Just work from the definitions, it is not a complicated exercise. Let's do the distributions one by one:

Bernoulli distribution: Well, $p + (1-p) = 1$, both $p$ and $1-p$ are non-negative, so done;
Uniform distribution: This one is also obvious, just note that $\frac{1}{n} + ... + \frac{1}{n} = 1$;
Binomial distribution: This one is a bit more interesting. We need to show that $p_0 + p_1 + ... + p_n = 1$ the $p_k$-s are a bit ugly in general. Luckily, there is a Binomial Theorem that tells you how to expand $(a+b)^n$. And all you need to do here is to use this theorem to get that \[1 = (p + (1-p))^n = \binom{n}{0}p^n(1-p)^0 + \binom{n}{1}p^{n-1}(1-p) + ... + \binom{n}{n}p^{0}(1-p)^n = p_0 + ... + p_n ;\]
Geometric distribution: If you know what is the sum of an infinite geometric progression, i.e that \[ 1 + q + q^2 + q^3 + ... = \frac{1}{1-q} \] for $1 \geq q \geq 0$ this one is easy too. All we need to do is to apply that formula here. If you are scared of infinite probability spaces and/or infinite sums, feel free to ignore this distribution for now.

Next, we will give names to a few very common random variables. They come up extremely often, that is why they have a separate name. So even if you do not immediately remember all of them (which is fine), you will get used to them over time.

Definition [Common Random Variables]

In fact, for any random variable, we can also talk about its probability distribution. The definition is analogous to the one above and is quite intuitive. So, if there is a random variable $X$ and if its possible values are $x_1, x_2, ..., x_n$ then \[ \{ \, \mathbb{P}(X=x_1), \mathbb{P}(X=x_2), ..., \mathbb{P}(X=x_n) \, \} \] is its probability distribution. Thus, we can talk about Bernoulli, Uniform, Binomial and Geometryc random variables. However, we must be more specific here, since it is also important which values, for example, a Bernoulli random variable is taking, since probability distribution does not specify this! So, by default (meaning that "unless otherwise specified") we have

Bernoulli random variable is the one taking value $1$ with probability $p$ and value $0$ with probability $1-p$. This $p$ is a parameter that must be specified of course;
Uniform random variable is the one having uniform distribution and some possible values $x_1, x_2, ..., x_n$ that are always specified (so there is no common default like for Bernoulli for example);
Binomial random variable is the one taking value $k$ with probability $p_k = \binom{n}{k}p^k(1-p)^{n-k}$. The values $n$ and $p$ are parameters that must be specified of course;
Geometric random variable is the one taking value $k$ with probability $ p \cdot (1-p)^{k-1}$ for $k = 1,2,3,4,5....$. The value of $p$ is a parameter that must be specified of course.

Exercise 5.

You roll two fair dice. Let $T$ be a random variable representing the maximum of the numbers that you roll. Find the probability distribution of $T$. (It may take some effort to do this exercise, but it is worth doing something like this at least once)

Explanation and comments

click on blur to reveal

First of all note that we are gonna work in the probability space with 36 simple events, namely all possible pairs of rolls. Each of these simple events will have probability $\frac{1}{36}$ since this is how we interpret the words "fair dice". Now, once we are clear on the probability space we are working in, let's look at all the simple events and see what is the value of $T$ we get:
\begin{array}{c c c c c c}
(1,1), \, T=1 & (1,2), \, T=2 & (1,3), \, T=3 & (1,4), \, T=4 & (1,5), \, T=5 & (1,6), \, T=6 \\
(2,1), \, T=2 & (2,2), \, T=2 & (2,3), \, T=3 & (2,4), \, T=4 & (2,5), \, T=5 & (2,6), \, T=6 \\
(3,1), \, T=3 & (3,2), \, T=3 & (3,3), \, T=3 & (3,4), \, T=4 & (3,5), \, T=5 & (3,6), \, T=6 \\
(4,1), \, T=4 & (4,2), \, T=4 & (4,3), \, T=4 & (4,4), \, T=4 & (4,5), \, T=5 & (4,6), \, T=6 \\
(5,1), \, T=5 & (5,2), \, T=5 & (5,3), \, T=5 & (5,4), \, T=5 & (5,5), \, T=5 & (5,6), \, T=6 \\
(6,1), \, T=6 & (6,2), \, T=6 & (6,3), \, T=6 & (6,4), \, T=6 & (6,5), \, T=6 & (6,6), \, T=6 \\
\end{array} For example you could write $T((5,3)) = 5$ since $(5,3)$ is a member of $\Omega$, i.e the set of all simple events, and $T$ is a function from $\Omega$ to $\mathbb{R}$. Then, the probability distribution of $T$ is
\begin{array}{c c c c c c}
& T=1 & T=2 & T=3 & T=4 & T=5 & T=6 \\
\text{prob: } & 1/36 & 3/36 & 5/36 & 7/36 & 9/36 & 11/36
\end{array}

So, a random variable is a function by definition. Just a function. Knowing this formality is important, but it is more vital that you have good intuition about it. For example, does it make sense to you that if you toss two identical coins separately, then you should use two different random variables $T_1$ and $T_2$ when talking about them, even though $T_1$ and $T_2$ are precisely the same functions? To stress the difference, people usually say that $T_1$ and $T_2$ are i.i.d = independent identically distributed random variables. This abbreviation means exactly what it stands for: the variables have the same distribution, but they are independent from each other. Now, the word "independent" is used here informally, the formal definition is yet to come. However, it makes intuitive sense and so for now let's keep it at this level.

Exercise 6.

Let $T_1, ..., T_n$ be i.i.d Bernoulli random variables, each with some fixed parameter $p$. Let \[ B=T_1+T_2+...+T_n .\] What type of random variable is $B$, what distribution it has? (It is fine to go with a mathematically-reasonable intuitive explanation rather than a perfectly formal one)

Explanation and comments

click on blur to reveal

Suppose that there are $n$ different fair coins, and let's say $T_i$ models whether it is heads or tails when we toss coin number $i$. Then the sum of $T_i$-s models the total number of heads when we toss all of the coins $1,2,3,...,n$ together. And we know that a random variable that models the total number of heads when we toss $n$ identical coins (each having probability $p$ as probability of getting heads) is called a Binomial random variable. So $B$ is a Binomial random variable with parameters $n$ and $p$. The distribution of such a random variable has been defined above.

2.3 Mathematical Expectation, Intro

Finally, we will define just one more extremely important concept. That will be it for today, don't worry ;)

Definition [Mathematical Expectation for a finite space]

As before, assume we have probability space $(\Omega, \mathbb{P})$ with probability distribution $\{ p_1, p_2, ... \}$. Let's further assume (for simplicity) that there are finitely many simple events, say $n$ of them. So we have $\{ p_1, p_2, ..., p_n \}$ as a probability distribution (these are arbitrary $p_k$, not the ones from binomial distribution of course). Then, for any random variable $X: \Omega \to \mathbb{R}$ we define \[ \mathbb{E}[X] = p_1 X(\omega_1) + p_2 X(\omega_2) + ... + p_n X(\omega_n) \] and we call it mathematical expectation of the random variable $X$.

Note: Sometimes instead of writing $\mathbb{E}[X]$ people write just $\mathbb{E} X$, i.e they skip the brackets. Oh, and the $[..]$ brackets do not mean the "integer part" in this context, they are merely a different kind of brackets that people use. Design, you know.

Exercise 7.

Find the mathematical expectation of a random variable representing a fair die roll. Make sure you clearly define the random variable first.

Explanation and comments

click on blur to reveal

First of all, let's stress that when talking about one die roll we are working in a probability space with $6$ simple events (forming $\Omega$, it was actually explicitly defined above) each having probability $\frac{1}{6}$. Now, a random variable representing a die roll is $X: \Omega \to \{ 1, 2, ..., 6 \} \subset \mathbb{R}$ such that $X(\text{roll }k) = k$". Then by the definition of mathematical expectation we get \[ \mathbb{E}[X] = \frac{1}{6} \cdot 1 + \frac{1}{6} \cdot 2 + \frac{1}{6} \cdot 3 + \frac{1}{6} \cdot 4 + \frac{1}{6} \cdot 5 + \frac{1}{6} \cdot 6 = \frac{21}{6} = 3.5 \]

Exercise 8.

Find the mathematical expectation of the random variable $T$ from the exercise 4.
(You can find the solution to the exercise 4 helpful.)

Explanation and comments

click on blur to reveal

Using one of the tables from the solution to the exercise 4, and using the definition of the mathematical definition only we get right away that: \[ \mathbb{E}[T] = \frac{1}{36} \cdot 1 + \frac{3}{36} \cdot 2 + \frac{5}{36} \cdot 3 + \frac{7}{36} \cdot 4 + \frac{9}{36} \cdot 5 + \frac{11}{36} \cdot 6 = \frac{161}{36} \approx 4.47 \]

Conclusions, and next steps

We will continue working with the mathematical expectation next time and in particular try solving a number of non-obvious problems. For now, please note again that despite the name, the mathematical expectation is a number you can calculate for any given random variable. It has a lot of nice properties, but in the end of the day it is just a number coming from some formula. One of the intuitive properties you could have already understood from the definition is that maths expectation is what you expect to get on average if you perform an experiment many many times. E.g if you roll a fair die many many times, then you should expect that the average of the values you obtain will be close to 3.5. It feels like it should be right, but the formal proof of this fact is not that simple. Even formally stating this fact is not that trivial, it is known as the "Law of Large Numbers". For now, we'll stick to understanding it at an intuitive level. Our intuition comes from the definition that "math expectation is the weighted average of possible outcomes".

Open the Problem Set

PA-4 / Lesson 3

03. Random Variables; $\mathbb{P}$ & $\mathbb{E}$

2.1 The basics

2.2 Probability distributions

2.3 Mathematical Expectation, Intro

Conclusions, and next steps