PA-1 / Lesson 5

ClassworkProblem Set

Problem Set №5

The loss function from the classwork is the name of the measure of how "smart" or "good" the model is. It is sometimes called score function, and the smaller/bigger the score of a model is, the "better" it is.

Problem 1.

This is more of an open and essay-kind of question: Think of a loss function for the language model from exercise 2 from the classwork that sounds nice and humanity-friendly. Then describe how, despite sounding good, it can still all go wrong if the model is powerful enough and is focused on minimising/maximising your loss function.

Problem 2.

A fair coin is tossed 4 times. What is the probability that there will be two consecutive throws that are the same? As always, clearly specify your probability model and, in particular, the set of simple events.

Problem 3.

At the end of the class, we talked about one kind of Machine Learning. In fact, there are many, even conceptually different, kinds of ML. For example, suppose there is data about a bunch of shopping mall customers: their age, income (in thousands of dollars per year) and their score = a number assigned by the mall based on customer behaviour and spending nature. That's all, there is nothing specific we want to predict, we just want to "make sense of data" we got. Here is what it looks like:

Where can you spot "not totally random behaviour", how can you describe your observation?

Problem 4.

Time magazine, in an article in the late 1950s, stated that “the average Yaleman, class of 1924, makes 25,111 \$ a year,” which, in today’s dollars, would be over 150,000 \$. Time’s estimate was based on replies to a sample survey questionnaire mailed to those members of the Yale class of 1924 whose addresses were on file with the Yale administration in the late 1950s. What do you think about this publication, the number 25111 \$? As you might have guessed, you need to provide reasonable criticism of this.

Problem 5.

Create or find a simple maths problem (middle school level tops) that you can personally correctly solve but ChatGPT can't. In particular, it must give a wrong answer to the question, not only some mistakes in the explanation. (We should be able to reproduce your result)

o_O

Machine Learning VS Human Learning

Today's what-up-story (yeah, that is the name of the pieces of text at the end of problem sets :)) briefly touches upon two topics:

The first one is "What does it mean that an AI model is learning?". Usually, it basically means "a computer program is trying to adjust some parameters to minimise a pre-defined loss function". That is it; it is simply trying to minimise some function. You may be wondering: "What is so difficult in finding the minimum of a pre-defined function for a computer? It should just draw it, and there it is!". The reality is: It is almost never possible to draw it, it is rarely something simple like $y=x^2$. Often, this function is a thousand-dimensional scary monster of a super complex form. So complex that even algebra does not help much. The best we can do then is to tell the computer an algorithm on how to slightly adjust its parameters to slightly improve the value of the loss function. And so the computer does it: it keeps on slightly modifying its parameters while slightly minimising the loss function. This is what we often refer to as "Look, the AI model is learning!". This is different to human learning, so don't get scared that there are robots that are conscious are learning. There are none. For now, at least.

The second topic is related to the kinds of ML there are. The model we considered in the classroom is a part of the so-called supervised learning kind. It is pretty much about finding the best function (=model) from a bunch of inputs (like high school GPA) to a bunch of outputs (like university GPA). Maybe a good one is a linear function $y=0.68x+1.07$, but maybe something more complex is gonna be "better", whatever that means. In any case, while trying to find a good function, we often have a lot of correct examples of what this function should approximately do (like data of 100 students' GPAs). In smarter words, we have labelled training examples. You may be surprised to learn, but this is not quite how much of our own learning goes: more often than not, no one tells us what the right answers are. Think about it (credits to Geoffrey Hinton for the following observation): a human only lives for around $2 \cdot 10^9$ seconds (yeap, that is all you got! This sounds terrifying, to be honest), while your brain visual system has $10^{14}$ neural connections. Thus it is no use learning one bit of information per second, you need more like 10000 bits. This is where the second kind of learning comes into play, the unsupervised one. It is the one where we try to make sense of the data we receive. A simple example is exercise 3 above.