For easier and less intimidating viewing, better use a bigger display. Unless you're watching a video or recalling something

01. Probability Models

Probability of rolling a $1$ on a fair die is said to be $\frac{1}{6}$. Why is that, who picked that $\frac{1}{6}$? Believe it or not, this question is not easy and the answer "it is a school formula" is not quite correct (plus it hides the fundamental truth about probability). In short, the much better answer to that question is "because we said so". Longer and proper discussion is what we do in our introductory course. As a quick summary, let's just cite Wikipedia: "the interpretation of probability is philosophically complicated, and even in specific cases is not always straightforward".

So, before talking about probabilities of some events we need to make sure we agree on a model. What this means is:

We ourselves decide upon what we call simple events. Any union of some of the simple events are called events;
Then we assign non-negative numbers to the simple events so that the sum of these numbers is equal to $1$. These numbers (that we assign) are known as probabilities of simple events;
Finally, for any event its probability is defined as the sum of probabilities of simple events it consists of.

Once we clearly specify all these, we say we have a probability model. As an example: when talking about one fair die people usually talk about a model in which there are six simple events: {rolling a 1, ..., rolling a 6} and they all have equal probability (because of the word "fair") equal to $\frac{1}{6}$. Therefore the question "what is probability of rolling a 1?" is quite silly since the answer is "it is $\frac{1}{6}$ by definition".

Exercise 1.

How would you answer the following question "why rolling two threes with two fair dice is equal to $\frac{1}{36}$?" Think about the model, simple events, etc...

Explanation and comments

click on blur to reveal

Answers like "It is just product rule" do not count. What do you mean exactly? Note that we are starting from scratches, so what rules are you talking about if not for the ones mentioned above? :)

There are 36 simple events, namely "roll 1, roll 1", "roll 1, roll 2", ..., "roll 6, roll 6". You can count that there are 36 of them or, if you want to be fancy, say that there are 36 of them because of the product rule from counting combinatorics. Finally, each of the 36 simple events gets the same probability assigned, i.e probability $\frac{1}{36}$. That is why in particular rolling two threes has probability $\frac{1}{36}$. By definition of our model.

One of the simplest model is the one where we have $N$ simple events with each of them having probability $\frac{1}{N}$. It is called uniform probability model. This is basically the one that most schools cover without always explicitly mentioning this. E.g each time the word "fair" comes up, it literally means that we are working with this model. But despite being simple and even though it is basically just counting combinatorics (our next topic by the way!), it is useful and there are a lot of nice problems on it.

Exercise 2.

If you toss a fair coin $10$ times, what is the probability that you see heads at least once?

Explanation and comments

click on blur to reveal

You can write down the answer right away actually, it is $1-\frac{1}{2^n}$. This is because the probability of all tails is $\frac{1}{2^n}$ and we are talking about the event "opposite" (in more mathsy terms: "complement") to "all tails" event.

More formally, we should write something along the lines of:

Let the probability model be such that there are $2^{10}$ simple events of the form {H/T, H/T, H/T, ..., H/T} (where "H" stands for "Heads" and "T" stands for "Tails"), with each having probability $\frac{1}{1024}$.
Among these simple events there is only one such that there are no Tails in it. All others have "Heads" at least once, therefore the answer to the question is $\frac{1023}{1024}$

Exercise 3.

You roll a fair die a) 2 b) 5 times and sum up the numbers that you see on them. What is the probability that the number you will get is divisible by 6?

Explanation and comments

click on blur to reveal

We will do both parts at the same time, the answer is the same anyway. Let's just assume we roll a fair die $k$ times.

Now, let's reformulate the question – once we define the uniform probability model here, the problem boils down to calculating \[ \frac{\text{number of good combinations}}{\text{number of all combinations}} \] where a combination is a sequence of $k$ rolls and a good combination is the one where the total is divisible by $6$. Now, how do we find that ratio? Let's group all of the combinations into groups of six, where each group has the same combination of the first $k-1$ rolls and one of the six possible rolls for the $k$-th roll. E.g, if $k=3$, then one of the groups looks like this: { (rolled a 4, rolled a 2, rolled a 1), (rolled a 4, rolled a 2, rolled a 2), ..., (rolled a 4, rolled a 2, rolled a 6) }. The key observation is that in each such group there is precisely one good combination. Since we divided all of the possible combinations into such groups of six, it means that \[ \frac{\text{number of good combinations}}{\text{number of all combinations}} = \frac{1}{6} \] and so the final answer is $\frac{1}{6}.$

The very simple uniform model is cool, but it is not the only one. You can come up with literally infinitely many models, it is just some of them might not be useful at all...
Also, you do not even need to have a finite number of simple events in your model by the way:

Exercise 4.

Probability of tossing a fair coin precisely $n$ times until first heads is said to be $\frac{1}{2^n}$. How would you explain this? I.e which model do you have, which simple events are we talking about?
(For this question you need to know that $\frac{1}{2}+\frac{1}{4}+\frac{1}{8}+...=1$)

Explanation and comments

click on blur to reveal

We will consider the model where the simple events are {H, TH, TTH, TTTH, TTTTH, ...}, and where the $k-$th simple event in this list has probability $\frac{1}{2^k}$. It all works just fine in particular because $\frac{1}{2}+\frac{1}{4}+\frac{1}{8}+...=1$, i.e the sum of probabilities of all the events is indeed $1$ as it should be.

It is fine if you do not know exactly what an infinite sum is. But almost surely you know about the fact about that sum being equal to 1, and this is enough for now. Also, you should not be considering a probability space with the simple events being all possible infinite strings of H/T, more about it after the exercise below

Exercise 5.

a) It is said that if you pick a point uniformly at random in a unit interval, then the probability it ends up in a left half is $\frac{1}{2}$. How would you interpret this? What are the simple events, what is the model?
b) Let's go even further: for any interval $I$ inside the unit interval that has length $s$, it makes sense to say that the probability of a point ending up in $I$ is $s$. But how would go about explaining this?

Explanation and comments

click on blur to reveal

a) When it comes to the first part of the exercise, we could say that $\Omega =$ {left half}, {right half} and say that the probabilities are $\frac{1}{2}$ for both of the simple events. It is fine, but it kind of ignores the structure of the problem, in particular that we have an interval in the first place. So it basically makes a tossing-a-coin problem out of this, even though it works.

b) This is where things get interesting: sadly, there is no way do something similar to how we have been doing things up until now for the this part of the exercise. Indeed, what is the probability of hitting a specific point $P$ on the interval? If it is larger than zero, say $0.000001$, then if we consider $100001$ points then since there is a word "uniformly" in the problem statement, then the probability of hitting one of them should be $100001 \times 0.000001 > 1$ which is bad. You can replace $0.000001$ with any number, a similar argument will apply. So it must be that the probability of hitting a specific point $P$ on the interval is precisely $0$. But then the probability of hitting anywhere on the interval is $0+0+0+0+ = 0$... Hm. And while for the part a) we can do the trick described above, how do we deal with the part b)?

We will not be providing a final answer here as it might be confusing. But it is okay for now! There is a way around all this, which is conceptually different: instead of assigning a number (=probability) to each point on the interval, we will do something else. It is not that difficult to be fair, but it is unimportant for now an it is a part of another course.

Side-note: In the Exercise 4 it might be tempting to say that the simple events are all of the infinite sequences of H-s and T-s. Let's think about this though. If we replace H with 0, and T with 1, then the set of all simple events becomes "all possible infinite strings of 0-s and 1-s" which is basically the set of all real numbers on the unit interval $[0,1]$ (written in binary). This is exactly the construction we considered in the exercise 5, and that is where we ran into some issues. Thus, considering the "all of the infinite sequences of H-s and T-s" is not the best idea.

A conceptual difference between the "{H, TH, TTH, TTTH, TTTTH, ... }" and "all of the infinite sequences of H-s and T-s" is that the first set is countable, while the second one is uncountable. We can formally define and play around with summations over the first kind of sets, but not over the second ones. This is why we have issues with defining probabilities for the second kind of sets, but not for the first one. This side note should be even more clear to anyone who has dealt with real analysis in the university or school: in the beginning of the theory of the analysis, when you talk about convergence of sequences, etc... you only deal with sequences which are countable because they are literally numbered $1,2,3,...$ by their indices. That is when you, among other things, define $x_1 + x_2 + x_3 + ...$ as the limit (if it exists) of the sequence $y_k := x_1 + ... + x_k$ as $k \to \infty$. This definition for the infinite summations makes sense and also it happens to behave nicely. However, this same logic applied to a bunch of numbers that cannot be put together in a sequence like that, breaks down.

Conclusions, and next steps

The key take away from today is that we ourselves decide upon the model. E.g we can pick a model where one simple event has probability 1 and all others have probability 0 – it is just this is a useless model. But how do you know which models are useful? Well, that is a bit of a philosophical question (we talk about it in the introductory course) as well as a serious mathematical question in certain cases. For example, you can take a coin, toss it 100 times and define the probability of head as $n/100$ where $n$ is the number of times you saw heads. In this way you do an experiment before you pick your model, and then your probabilities are more like "relative frequencies".

In any case, you can trust that the models that have a name (e.g the uniform one) are useful. That is why they have been given a special name in the first place :) Thus practising problem solving on them is useful as well. This is what a big part of this course is about.

Open the Problem Set

PA-4 / Lesson 1

01. Probability Models

Conclusions, and next steps