Problem Set №2

This time there are not many problems, in particular because last time you got a ton of problems to solve and it is understandable if they take time

Problem 1.

There is a set of numbers ${-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5}$. Replace one of its numbers with two other integer numbers in such a way that the variance does not change.

Problem 2.

There are $n$ numbers: there is one that is equal to 0, and there is one that is equal to 1. What is the minimal possible variance of this set of numbers?

Problem 3.

Let $x,y, z$ be three sequences of numbers such that $x$ and $y$ are 90% correlated, and $y$ and $z$ are 80% correlated. What is the minimum correlation that $x$ and $z$ could have? What about the maximum?

Problem 4.

There are $n$ real numbers $x_1, x_2, ..., x_n$. For what number(s) $a$ does the function $ \sum_{i=1}^n d(a, x_i) $ attain its minimum, where
a) $d(x, y) = (x-y)^2$
b) $d(x,y) = |x-y|$ ?

Problem 5.

For last exercise from the classwork, please derive the formula for $b$. Comment on your result.

Problem 6.

There are $n$ data points $(x_i, y_i)$ and there are two prediction models $f_1$ and $f_2$ (not necessarily linear, both are $\mathbb{R} \to \mathbb{R}$ functions). Let $L_1$ and $L_2$ be two loss functions defined as follows: \[ L_1 = \frac{1}{n} \sum_{i=1}^n |f(x_i) - y_i| \; \text{ and } \; L_2(f) = \frac{1}{n} \sum_{i=1}^n (f(x_i) - y_i)^2 \] Could it happen that \[ L_1(f_1) > L_1(f_2) \; \text{ and } \; L_2(f_1) < L_2(f_2) ? \] Thus, the answer to the question which model $f_1$ or $f_2$ is "worse" depends on which of the two loss functions to pick.

Problem 7.

You have $n$ pairs $(x_1, y_1), ..., (x_n, y_n)$ of numbers. Let "the best" linear function be $f(x)$, i.e it is the function that will predict $y_{i}$ if $x_{i}$ is given (usual linear regression model). How will this function change if we have $(x_1, y_1), ..., (x_n, y_n), (x_1, y_1), ..., (x_n, y_n)$, i.e basically two copies of the above data set?

SP-1 / Lesson 2

Problem Set №2