PA-1 / Lesson 4

ClassworkProblem Set

Problem Set β„–4

Problem 1.

For each of the following pairs of variables, guess (and try explaining in a few words) if the two variables are negatively correlated or positively correlated:

  • Time on social media and time spent on studying;
  • Demand for a product and price for this same product;
  • Concentration level and task completion time;
  • Time spent on education and future income;
  • Amount of sugar consumed and number of visits to a hospital;

Problem 2.

A gardener is researching a crop of sunflowers. He selects 10 sunflowers at random and measures their height and the number of leaves. The table below shows the results.

Make a scatter plot for this information. What kind of correlation do you see?

Problem 3.

Once, there was a massive email survey in India asking how much time people spend on YouTube. By the end of it, they had 100000 responses. Then they took the average of the submitted answers, got "around 130 minutes", and published, "An average Indian person spends 130 minutes per day on YouTube". What do you think about this publication?

Problem 4.

If we collect information about the total number of Master’s degrees issued by universities each year and the total box office revenue generated by year (i.e how much money cinemas make), we would find that the two variables are highly correlated, see the pictures below. Does it mean that watching movies makes people study harder? What do you think about this?

Problem 5.

A factory produces items using three machines β€” A, B, and C β€” which account for 20$\%$, 30$\%$, and 50$\%$ of its output respectively. Of the items produced by machine A, 5$\%$ are defective; similarly, 3$\%$ of machine B's items and 1$\%$ of machine C's are defective. If a randomly selected item is defective, what is the probability it was produced by machine C?
Hint: This problem is similar in spirit to the one about the Kwacha virus from the first classwork. You are welcome to use informal (but clear!) arguments to solve this question testing your intuition about probability. No need for Bayes' Formula either (this is a remark for those who know what that is)

o_O

Spurious Correlations, and Placebo Effect

Making mistakes with conclusions based on what looks like "a strong correlation" is quite common. There are tons of examples, and they are referred to as spurious correlations. There is even a Wikipedia page dedicated to this phenomenon alone, the page is called "spurious relationship".

Besides ordinary people making mistakes here, the media loves to make them as well. Below, you can find a few examples out of millions one can find on the internet of the two things that were "proven" to be strongly correlated with each other.

On a note related to "correlation does not imply causation", the Placebo Effect is worth mentioning. For those who do not know what this is, it is when a person's physical or mental health appears to improve after taking some fake medicine that contains no curing substances (but thinking that he/she is taking real medicine). Since this effect is quite strong, medicine tests are not simply "give the new medicine to, let's say, 1000 volunteers and see if it helps them". Think about it: if that kind of survey tells you that "300 people got better, others reported no change", is it good or bad? How strong was the Placebo effect? Therefore, instead of that simple approach of giving new medicine to everyone, a new medicine is given to only about half of the volunteers, while the other half gets a non-curing, empty, medicine-looking thing. Then we compare how many people from each group come back, saying that they got better. The real tests are often a bit more complex than this, but the key idea is the same.