It could be argued that a significant part of your higher-level education comes from times when you encountered an interesting problem that you had actually barely any idea how to approach or perhaps did not even fully understand. These experiences are when, despite a lack of guidance, you do end up learning a lot provided you are interested enough to do well.
With this in mind, today's Extra involves just one project, credits to Emily Zitek for it. To excel at it, you'll need to engage in Python coding, use your understanding of Linear Regression Models (potentially dig in a bit deeper) and do a bit of Googling about data analysis programming tasks. It's possible that you haven't done any Python programming before, or perhaps you've used it, but not specifically for data analysis tasks. Again, this is the point. This time.
To be fair, if you have done data analysis in Python (or similar languages, I keep on writing "Python" since this is what I recommend you use here) before, it won't take long to get to something reasonable here. For everyone else: this is your chance to research, try & fail, talk to ChatGPT, browse Kaggle and finally: improve your coding skills. In particular, get some knowledge about Pandas, Matplotlib and SciPy libraries, whatever those things are... o_O
Project Description
As you know, when deciding whether to admit an applicant, colleges take lots of factors, such as grades, sports, activities, leadership positions, awards, teacher recommendations, and test scores, into consideration. Using SAT scores as a basis of whether to admit a student or not has created some controversy. Among other things, people question whether the SATs are fair and whether they predict college performance. This is what you are going to research.
There is data about 100 students grades & scores in the .csv format, you can download it here. Brief description of the columns from the data file are below:
Variable
Description
high_GPA
High school grade point average
math_SAT
Math SAT score
verb_SAT
Verbal SAT score
comp_GPA
Computer science grade point average
univ_GPA
Overall university grade point average
Below are the questions you need to answer, some are very specific (to get you started), some are more general (to leave some room for research). You do not have to focus solely on these questions: you are welcome to branch out!
If you do want us to check your work, please do it in Google Colab: it is an easily sharable Jupyter Notebook. You can both do data analysis, Python coding, writing comments (using Latex even), etc... all in one place. In particular, you can produce a great report with functioning pieces of code all there. No special set up needed.