For easier and less intimidating viewing, better use a bigger display. Unless you're watching a video or recalling something

2. Further Prep; Linear Regression

2.1 Disclaimer

Unfortunately, securing an offer isn't as easy as 'if you can solve most of the problems from the previous lesson, you'll get an offer.' I don't mean to say solving most of those problems is easy, but it's nonetheless surprising how much closer you can get to an offer by being good only at basic probability and critical thinking! Therefore, please make sure you're comfortable with the standard topics such as combinatorics, conditional probability, Bayes' theorem, mathematical expectation with its applications and the beginning of continuous probability set up. This is a must.

However, the requirements often extend beyond these basics. For instance, a decent amount of companies will ask questions to test deeper understanding of probability theory, such as 'find the $N$th moment of a standard normal distribution' or 'bound the correlation between the two random variables given some condition'. It is understandable that you might not be prepared for such specific inquiries right now (it is fine!). The good news is that it primarily involves theoretical knowledge, which means that if you dedicate time to "read and practice", you will become well-prepared! This is in fact less challenging than developing great mathematical intuition and analytical thinking, which could take very long time. Thus, once more, you're closer to being ready for a high-paying job than you might think.

2.2 Technical Components of the Interviews

For the sake of completeness, here is the list of the topics of the problems & questions that could come up at an interview or in an online assessment (both could be a part of the application process). Once again, this for a junior position or an internship (since this course is focused on the inexperienced people trying to "get in") as a quant-researcher or as a trader. Just to have it all in one place together with a few useful references:

Probability theory basics, logical thinking. Solving problems to practise these is what the first lesson of this course is dedicated to. You should be comfortable with the basics of probability theory (in particular have decent intuition about it), and if you are not, please contact us and/or consider doing a preliminary course. For example, it could be the PA-4 Quanta's course, which has lots of interesting and non-trivial problems unlike most (if not all) universities' courses. Being able to apply the Law of Total Probability, being well-acquainted with the mathematical expectation, etc... form a foundation for all further discussions, and most of the interesting uses of probability theory. Not necessarily quant-research or trading by the way;
Deeper maths theory, statistics. From experience, the exact demands here vary on the company and which year of university you are at. In general, the questions you will get asked will be gauged against your past experience. So, while many companies do expect familiarity with concepts like the Normal Distribution, Central Limit Theorem, moment-generating functions, hypothesis-testing, and so on... — if you have not yet covered those at your university, they will not be expecting you to know them. Unless they clearly want it according to their job description and unless you have been studying in university for a few years. Thus:
1) If you are a high-schooler / university fresher, then you are lucky enough not to have to be as worried about all this. Keep on being focused on the content from this course, and only if you do want to spend extra time on learning more complicated theory, I recommend you take a slow-and-steady approach on learning Probability. You can either follow some good notes on it (these notes are by a particular Cambridge University professor, they are well-put), and/or sign up for the Quanta PA-5 Course. The notes are long, very fundamental and formal (with not that many tough exercises), the course is in the usual Quanta's style: many problems, no "tons of definitions and weird integrals in one place", a lot of intuition but it overall covers less theory (it discards the theory that you are unlikely to remember anyway and which is not often used);
2) If you are at least 2.5 years into university, then it is very likely that you have covered enough of theory from probability & statistics (if not, you have to take action and jump into whatever written right above), but you do not have enough practise with the concepts and facts you were taught (cause most universities put only straightforward exercises in their problem sets). I.e it is likely you have heard of, for example, gaussians and statistical tests, but you are not as comfortable solving problems about them on the spot. For you, there will be a list of problems in this course later on to practise — but if you want to take a more fundamental approach, I once again recommend the notes on Probability Theory, created by a Cambridge professor and/or Quanta PA-5 course (with more of a problem-solving-based approach instead of tons of formalities);
Linear Regression and around it. Although it's technically a part of the previously mentioned theoretical knowledge, Linear Regression is so critical and frequently discussed in interviews that it deserves special attention. Funnily, even if you understand Linear Regression as "drawing a line that best fits the data" and know what MSE is, it is a decent start! However, there is much more to know and it is extremely helpful to solve a few problems on Linear Regression, as well as on the concepts around it, like e.g "correlation";
Maths finance. This will hugely vary based on previous exposure: most companies prefer to test your intuition around the topic rather than you knowing how to price an option. Moreover, assuming no formal background, there are a number of books that cover the fundamentals, with the most informative one being "The Concepts and Practice of Mathematical Finance". Covering Chapters 1-3 will actually suffice. The idea is to get a feel for the general approach to pricing and the importance of arbitrage arguments. Nonetheless, it's beneficial to have a basic understanding, which typically requires some reading. Besides, if you want to get a better exposure to all this and be much more confident with maths finance in particular, please look at the Quanta's SP-2 course (part of the "Getting into Finance Plan"), but do note that it relies on you already knowing a bit about Linear Regression and also be comfortable with the continuous set up;
Programming & Algorithms. Although it may disappoint those who dislike coding, possessing programming skills significantly boosts your job prospects. In finance and other fields alike. There are good news here though: it is quite easy to learn it, plus the demands from junior positions (or internship people) are relatively low! Interview questions and take-home assessments are almost always either dull programming exercises or they are about algorithms, often about the so-called "dynamic programming" (which is programmers' way of saying mathematical induction). All this applies for quant-research and traders positions, i.e not software engineers. Also, we will actually talk more about this programming aspect of interviews later, when it becomes crystal clear why it is absolutely crucial, what libraries are important and what to do to prepare for the (possible) data analysis segments of the application process (which is where being comfortable enough with the basics of some Python stuff is actually vital);

2.3 Variance & Correlation

Time for maths. This subsection's name may sound like as a narrow and unimportant topic, however it is a building block for the theory that most quants and traders use, especially during an internship or at a junior positions. This theory includes the topic of Linear Regression, whose importance has been stressed above. In particular, it means that this subsection's problems are a part of the preparation for the interviews. In fact, we will get to doing a few questions from real & recent interviews again!

Unlike most educational institutions, we will approach those two unattractive terms in the subsection's name from a different angle: less formal definitions with lots of $\mathbb{E}[X]$ floating around, more "real life and data" kind of discussions. We will even play a little trading game soon!

So, the word "correlation" comes up in many conversations every once in a while, and it always means some measure of the extent to which two or more variables behave in relation to each other. We don't even have to perfectly formalise it to understand its potential use: if I know that today's stock price of Microsoft is strongly correlated with yesterdays stock price of Nvidia, I can try to make use of this information to predict the future moves of Microsoft's price. Even if it is not "always true" that "if Nvidia dropped in price, then Microsoft will also drop tomorrow", but merely "more often than not true", it is already non-trivial amount of information that could be used to make money provided you trade a lot (so you can apply the Law Of Large Numbers).
Spotting those helpful correlations is not easy. Let's do a quick exercise and try to spot a (potentially-useful for predicting the price moves) correlation on this plot below:

The key observation here is that almost every the price goes up, it goes up again in the next hour and if the price drops, it drops in the next hour as well. This is not a totally "random" behaviour if you think about it. If it were a totally random behaviour, then right after the growth we would expect downgrade in around 50% of the times. I.e the next price change is posititvely correlated with the the last price change. This phenomena even has a name, it is called positive autocorrelation (guess what kind of behaviour is called negative autocorrelation). It is a fancy, but self-explanatory, term. You could also call it a "trend".

Exercise 1.

By making use of the positive autocorrelation phenomena, find a way (=algorithm) to almost never have less than $1050$ dollars in the "total worth" by the end of the trading game below. Ideally also try to sometimes get more than $1150$ by the end. It should be clear what is going on, but here are the details just in case:

The game consists of 20 rounds. On each round you can either "do nothing" or "buy 1" / "buy 5" / "sell 1" / "sell All" of the egg cookers you have, provided you have enough money or egg cookers correspondingly. Yeah, you are basically trading egg cookers...;
The price of one egg cooker changes after your action. It changes independently of your actions, but you know for sure that it is positively autocorrelated. That is all you know for sure about its price;
The total worth is calculated in an obvious way: it is the money you have left + the total worth of the egg cookers you owe.

Money Left: $1000

Number of Egg Cookers You Own: 0

Total Worth: $1000

Explanation and comments

click on blur to reveal

Once again, intuitively this phenomena means that if last time the price went up, it means that the price is more likely to go up again. On the other hand, if the price just dropped, it is more likely to drop again. With this in mind, let's stick to the following strategy:

On the very first round we "Do nothing". Just to see what happens;
If the the price just went up, we do the "buy 5" action. If we can't, we "buy 1". If we can't do that neither, we "Do Nothing". Because of the positive autocorrelation we are more likely to increase our total worth on the next round;
If the price just went down, we "Sell All". If we have nothing to sell, we "Do Nothing". This is again because once the price dropped, we expect it to drop even further on the next round. Thus we are more likely to lose our "total worth" if we have non-zero amount of the egg cookers: we can avoid this by having no egg cookers.

This is not a $100 \%$ winning strategy, but we are doing probability theory, i.e theory of chance :) We can further modify this strategy by the way: we can do "Sell All" and then always "Do Nothing" once our total worth is above $1050 \$ $. This will be an almost-always (~90% of the time) working strategy to finish with $>1050 \$ $ at the end. It will not however ensure that we get to $> 1200 \$$ sometimes. For this, we do need to stick the strategy above until the end, at least every once in a while — but then we increase the risk of finishing with less than $1050 \$ $. Thus there is some trade off between risk and expected final total worth.

The exercise above is cute and insightful, but we still have not quantified our intuition at all! We have not formalised anything in this subsection really. For example, what exactly is a "weak correlation"? How to quantify any of this?

Let's answer those questions. Just like before, we will be engaging with a real-world problem involving two entities that generate somewhat random values. These could be, for example, the minute-by-minute prices of certain stocks or the monthly totals of specific virus cases globally. The precise nature of these values is not important to us, mathematicians; rather, we will treat them as two finite sequences of numbers: $x_1, ..., x_n$ and $y_1, ..., y_n$ (same length is important of course). The question is, given these two s of numbers, how do we nicely assign a number that measures the extent of these two strings of values are related to each other?

Definition [(Pearson) Correlation Coefficient]

The correlation between two non-constant sequences of numbers $x=[x_1, ..., x_n]$ and $y=[y_1, ..., y_n]$, more officially known as the Pearson Correlation Coefficient (or sample correlation coefficient as well), is defined as \[ \text{corr}(x,y) = \frac{\sum_{i=1}^n (x_i - \overline{x})(y_i - \overline{y})}{ \sqrt{ \left( \sum_{i=1}^n (x_i - \overline{x})^2 \right) \cdot \left( \sum_{i=1}^n (y_i - \overline{y})^2 \right)}} \] where $\overline{x}$ is the average of the $x_i$-s and the $\overline{y}$ is the average of the $y_i$-s.
(Those $x$ and $y$ are merely the names for those sequences, not random variables or constants or something else).

The formula right above is almost objectively damn-ugly. But I nonetheless recommend you stare at it for a while to think if it makes any sense to you. Moreover, please prove the following convenient property of it:

Exercise 2.

Show the Pearson Correlation Coefficient belongs to the interval $[-1, 1]$. When does it equal to $1$ or $-1$?

Explanation and comments

click on blur to reveal

It is a direct implication of the Cauchy-Schwarz inequality for the numbers $|x_i - \overline{x}|$ and $|y_i - \overline{y_i}|$. The absolute values around those are important here. Correlation being equal to $-1$ or $1$, is equivalent to the equality case in that inequality, which happens online when $[x_1, ..., x_n]$ and $[y_1, ..., y_n]$ are linearly dependent (so $y_i = ax_i + b$ for some constants $a$ and $b$, or $x_i = ay_i + b$ for some constants $a$ and $b$).

We finally have a measure of how related two things are! It is even "nice" in the sense that it is independent of the nature of those sequences of numbers we can always assign a number $\in [-1, 1]$ telling us how "related" those sequences (and thus the entities producing those random numbers) are. We are yet to understand better why would it make sense to say that this weird Pearson Correlation can be thought of as "a measure of relation", but we are already getting somehwere.

To be honest, you will rarely see the formal definition of a correlation before the two definitions that are about to appear, which are variance and covariance. However, it is the word "correlation" that is often used in speech, and so it feels more intuitive to start with it. Anyway:

Definition [Variance of a sequence (set) of numbers]

The variance of a sequence of numbers $x = [ x_1, x_2, ..., x_n ]$ is the average of the squared distances from each term to the mean, i.e it is equal to \[ \text{Var}(x) = \frac{1}{n} \cdot \sum_{i=1}^n \left( x_i - \overline{x} \right)^2 \] where $\overline{x} = \left( 1/n \cdot \sum_{i=1}^nx_i \right)$, i.e again the average of the $x_i$-s.
(The $x$ is merely the name for that sequence, not a random variable or a constant or something else).

Remark: Technically, we do not need to care that it is a sequence (an ordered list) for the definition of the variance to make perfect sense. It could be a set of numbers, it will be exactly the same thing. I am merely sticking to the words "sequence" and "list", because we are discussing all of this in the context of a correlation between two things where it is important that $x_i$ kind of corresponds to $y_i$. This is a pretty obvious remark, but still, please do not get confused.

As the name suggests, variance gives us a numerical measure how scattered a data set is. In simpler words, it measures how "crazy" the data set is. Just stare at a formula and you will see why it is true, it luckily makes perfect intuitive sense. Next, let's define:

Definition [Covariance between two sequences]

The covariance between two sequences $x=[x_1, ..., x_n]$ and $y=[y_1, ..., y_n]$, once again the order here matters, is defined as follows: \[ \text{cov}(x,y) = \frac{1}{n} \cdot \sum_{i=1}^n (x_i - \overline{x})(y_i - \overline{y}) \] where $\overline{x}$ is the average of the $x_i$-s, and the $\overline{y}$ is the average of the $y_i$-s as before.
(Those $x$ and $y$ are merely the names for those sequences, not random variables or constants or something else).

It is more difficult to find the right words for the meaning for this. One of the things one could say is that covariance indicates the direction of the linear relationship between two things. However, you cannot judge by its magnitude how strong is the strength of that relationship. To be able to talk about the "strength" of it, we need a baseline of some form, something to compare it to. This is where correlation comes in: \[ \text{corr}(x, y) = \frac{\text{cov}(x,y)}{\sqrt{\text{Var}(x) \cdot \text{Var}(y)}} \in [-1, 1] \]

Let's finish off this subsection with a few standard properties of the recently defined terms.

Exercise 3.

Let $x=[x_1, ..., x_n]$ and $y=[y_1, ..., y_n]$ be two sequences with averages $\overline{x}$ and $\overline{y}$. Prove the following

The variance is always non-negative, and is equal to zero only if the sequence is constant. Thus explain why the definition of correlation of $x$ and $y$ contains the word "non-constant" in it.
Prove that $\text{Var}(x) = \frac{1}{n} \cdot \left( \sum_{i=1}^n x_i^2 \right) - \overline{x}^2$.
Show that if $y$ is a constant sequence, then $\text{cov}(x,y) = 0$. Is the reverse true?
Show that $\text{cov}(x,y) = \frac{1}{n} \sum_{i=1}^n x_i y_i - \overline{x} \cdot \overline{y}$.
Show that if $z = [z_1, ..., z_n]$ is another sequence, then $\text{cov}(x,y+z) = \text{cov}(x,y) + \text{cov}(x,z)$, where the addition of the sequences is component-vise. Note that you do not need any "independencies" for this equality to hold, it is always true.
Prove that $\text{Var}(x+y) = \text{Var}(x) + \text{Var}(y) + 2 \cdot \text{cov}(x,y)$.

Explanation and comments

click on blur to reveal

All of these properties are easy to show, they are merely about rearranging a bunch of stuff:

Note that a sequence is non constant if and only if there is a term not equal to the mean of the sequence. The desired conclusion immediately follows. As for the definition of the correlation: well, if at least one of the sequences $x$ or $y$ were to be constant, then we would be dividing by zero in that formula. But we can't. And if you were to assign a number to the correlation of a non-constant sequence $x$ and a constant sequence $y$, what would you pick?
Just basic algebra :)
If $y$ is a constant sequence, then all of its terns are equal to its mean, hence $\sum_{i=1}^n (y_i - \overline{y})$ is 0 which implies the desired conclusion. However, the reverse does not have to be tree! As a big hint for a counterexample, think about a regular 100-gon centered at the point $(0,0)$ whose vertices are $(x_i, y_i)$.
Write \[ \begin{align*} \frac{1}{n} \cdot \sum_{i=1}^n (x_i - \overline{x})(y_i - \overline{y}) & = \frac{1}{n} \cdot \sum_{i=1}^n (x_iy_i - \overline{x}y_i - \overline{y}x_i + \overline{x}\overline{y}) \\ & = \frac{1}{n} \cdot \sum_{i=1}^n x_iy_i - \frac{1}{n} \cdot \sum_{i=1}^n \overline{y} x_i - \frac{1}{n} \cdot \sum_{i=1}^n \overline{x} y_i + \left( \frac{1}{n} \cdot \overline{x} \cdot \overline{y} \right) \cdot n \\ & = \frac{1}{n} \sum_{i=1}^n x_iy_i - \overline{x} \cdot \overline{y} - \overline{y} \cdot \overline{x} + \overline{x} \cdot \overline{y} \\ & = \frac{1}{n} \sum_{i=1}^n x_iy_i - \overline{x} \cdot \overline{y} \end{align*} \]
Algebra.
Once again, non-difficult algebra, you can do it :)

2.4 Linear Regression

We will now move on to working with the mathematical tool that seems to be used by most of the trading companies to make money. It is genuinely surprising how effective and popular it is despite its simplicity: only around one in five people I talked to about their trading or quant internships, told me that they went significantly beyond (mathematically speaking) just using Linear Regression in different contests! This mathematical tool also happens to be a popular interview question.

Unlike what has been done with the Variance, Correlation & Covariance, we will not be talking about the motivation for the Linear Regression. Nor will we tip toe around it too much. Thus, we will jump straight into using it, building some theory around it and solving problems on it. It would be fair to talk about prediction models in general before jumping "straight into action", and in particular show you that even in a rather simple scenario, the Linear Regression Model might not be "the best" or "nearly the best". For this discussion, please check out the second half of this classwork from the introductory course.

So, Linear Regression Model is merely one of the possible models that helps generating predictions. And even though it is written above "...went significantly beyond just using Linear Regression..." suggesting it was not complicated, it is not the right conclusion. It is not a simple thing in general! There are a lot of interesting facts about it as well as there are a few caveats, especially once you go multi-dimensional.

Luckily for you, we will not be going multi-dimensional today, meaning we will be working with a Simple Linear Regression Model in which there is one variable, call it $Y$, that is our target variable, and there is just one feature, call it $X$, that is our predictor, and our ultimate goal is to find the "best" or "as good as possible" constants $a$ and $b$ such that $aX + b$ predicts $Y$. Here $X, Y$ are both random variables.
We need to be more specific and formal here, in particular about the word "best". One of the ways to go about all this, is the following: let's collect some data (or it might be given already) with realisations of $Y$ and $X$, say they are $(x_1, y_1), ..., (x_n, y_n)$. A standard example is the price of a house is corresponding to $Y$, while the distance from it to the city centre is corresponding to $X$. Then those $(x_i, y_i)$ are pairs of numbers that is the corresponding information about some $n$ houses. Define

Definition [Quadratic Loss Function]

If $y_1, y_2, ..., y_n$ are the true values (that we observed), and $f(x_1), f(x_2), ..., f(x_n)$ are our predictions (where $f$ is our model), then the quadratic loss function $L$ is defined as \[ L(f) = \frac{1}{n} \sum_{i=1}^{n} (y_i - f(x_i))^2 \]

and then say that we will pick $a$ and $b$ in such a way, that the loss function for the model $f(x) = ax + b$ is the minimal possible (all over all possible choices of $a$ and $b$). This is called Least Squares Estimation (LSE) because it gives rise to the least value for the sum of squared errors. So it makes sense and you probably have already seen all this before.

The picture above shows a solution to the LSE in the problem where there are $100$ students, each having a pair of grades (high school GPA, university GPA) and where our ultimate goal is to find the best linear prediction for the "university GPA", our target variable, given just one feature which is "high school GPA". The orange line in the picture is that solution, i.e that "line that fits the date best". For this particular case it turned out that $a=0.68$ and $b=1.07$, so our model is $f(x) = 0.68 x + 1.07$, i.e \[\text{prediction} = 0.68 \cdot \text{(high school GPA)} + 1.07 \]

Even if this is somewhat new to you, it should make sense. Of course, one can ask questions like "why linear model?" or "why squared errors, why not just absolute values?" – but we simply picked such a model with such a way of measuring "bestness of the model" through that particular loss function. Neither of these choices have to be "the best", they are merely "a choice that makes intuitive sense".

Finally, we are about to get to something that you might not have seen or realised before. Since this classwork is getting too long already, we will actually solve just one more exercise here, and move on to the next step, the Problem Set. You are left with digesting all of the information (the vast majority of which is likely to be not new), and with solving a few interview-level problems.

Exercise 4.

To sum up: there is data $(x_1, y_1), ..., (x_n, y_n)$, and we would like to pick constants $a$ and $b$, so that the loss function \[ L(f) = L(a, b) = \frac{1}{n} \sum_{i=1}^{n} (y_i - ax_i - b)^2 \] is minimised. While this "makes sense" for our purposes of predicting something, a maths-minded person can view it as an algebra problem – and so this exercise is about solving it. Well, at least find the value of $a$ in terms of $x_i, y_i$.

Explanation and comments

click on blur to reveal

It is not a difficult exercise if you know calculus: simply equate the partial derivatives to 0 and find the values $a$ and $b$ from the equations you get. You don't have to do calculus by the way, there are other ways to solve it: including a way that is merely about algebraic manipulations – the $L(a,b)$ is quadratic in $a$ and $b$ after all, it is not a sophisticated function. So one way or another (but please do it!) you can find that
\[ a = \frac{\sum_{i=1}^n y_i (x_i - \overline{x})}{\sum_{i=1}^n (x_i - \overline{x})^2} \qquad b = ...\]
where $\overline{x}$ is the man of the $x_i$-s. The "..." for $b$ mean that it is left for you to actually calculate :)

What is more interesting here is that we rewrite the formula for $a$ as \[ a = \frac{1/n \cdot \sum_{i=1}^n (y_i - \overline{y}) (x_i - \overline{x})}{1/n \cdot \sum_{i=1}^n (x_i - \overline{x})^2}\] by adding the $0 = -\overline{y} \cdot 0 = -\overline{y} \cdot \sum_{i=1}^n(x_i - \overline{x})$ to the numerator, and then multiplying both numerator and denominator by $\frac{1}{n}$. And this new formula for $a$ should resemble something... Indeed, using the definitions from above, we can further rewrite \[ a = \frac{\text{cov}(x,y)}{\text{var}(x)} \] which is nicely interpretable! It suggests that the slope of linear regression measures how related $y$ and $x$ are, i.e how much of $y$ can be explained using $x$. It also quantifies the degree to which $x$ can be predicted from $x$ based on the historical data.

2.5 Remark about todays definitions

This course adopts a different approach for introducing the terms like "correlation", "variance" and "covariance" — instead of introducing those as some operators on random variables (like universities do), we defined them in the context of having data. Now, the main remark here is that if you now try to google things you might see things like \[ \text{sample variance} = \frac{1}{n-1} \cdot \sum_{i=1}^n \left( x_i - \overline{x} \right)^2 \] which is slightly different from what we did! If $n$ is large it will give almost exactly the same result, but it is technically different. So a reasonable question can pop up in your head: who is lying? Quanta or those internet resources?

The answer is "Noone". The truth is that the word "variance", just like the words "covariance" and "correlation" mean different things depending on the context. Unfortunately, many online resources or even people are too lazy or uneducated to clarify the context. Without going in too deep, we simply ask you to bear with us till the end of the next lesson where we will define those terms in a probabilistic setting (and those definitions are everywhere the same), and then comment a bit more about the $\frac{1}{n}$ vs $\frac{1}{n-1}$ situation.

Conclusions, and next steps

Linear Regression Model is a super star in a way: both the interviews and the trading/quant jobs themselves like it. We have just started talking about this model, the simple 2D version of it to be more precise. But as you can already see from the last exercise, the coefficients in the simple linear regression are not meaningless. Well, you definitely know that at least one of them makes sense given the basic intuition and knowledge about the correlation and variance! Hint: the other coefficient is also insightful.. Being able to provide motivation or talk about the meaning of your result is also a skill that can be tested at interview by the way.

Moreover, we will use a simple linear regression for creating a trading algorithm that has actually helped saving millions of dollars for a trading firm for a particular kind of trades. For real! However, we will get to it later, during the lesson number 4. For now, please master the concepts by solving the problems from todays Problem Set. As for the lesson 3, i.e the next one, we will go deeper into the maths theory and redefine those terms we just talked about. Once again, strong intuition around them as well as the mathematical formalism and the ability to apply it, are vitally important for passing the interviews! So don't try to avoid it.

Open the Problem Set

SP-1 / Lesson 2

2. Further Prep; Linear Regression

2.1 Disclaimer

2.2 Technical Components of the Interviews

2.3 Variance & Correlation

2.4 Linear Regression

2.5 Remark about todays definitions

Conclusions, and next steps