Problem Set №4
Continuing with the idea of giving you a challenge that you might not initially have any idea on how to approach (as in Extra 2), let us share the data that you need to analyse (using Python) to find decent answers to a list of open-ended questions. Despite a lack of guidance and clear instructions, you will end up learning a lot provided you are interested enough in the problem itself., that is the idea backed by real experiences at least. The problem this time is gonna be "analyzing Ethererum prices to potentially be able to predict its price movements well enough to make money".
Before we share the data and the questions, let us share two remarks:
- Even though the data is real, the questions are the actual questions that give insights into how things work and how to make potentially useful observations, we are NOT encouraging you to get into Crypto-trading! Actually, we advise you NOT to do it even if you think you have a strategy and even given that the data is publicly available and not that hard to find;
- Even if you barely have any programming experience, you can still do well with this task. For real! Python is very friendly, and you do not need much code at all to download, process and plot the data. What you might need to do is lots of googling and talking to ChatGPT;
About the Data
You can download the data in the .csv format below. This data contains the ask_price, ask_size, bid_price, bid_size (i.e the top of the order book) and the timestamp of the tick (which is the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, minus leap seconds. For real, this is the definition):
Questions for you
Here are the open-ended questions you may attempt to answer. The "open-ended" means that there is no perfect answer, and you are welcome to dive in much deeper into any of them. Actually, you can even do something merely related to the questions below, but not exactly answering them:
- How many rows are there in the data file? Produce a nice plot of the midprice evolution throughout the whole day. Why do we even introduce such a thing as a "midprice"?
- What is the open price, what is the close price for this day of data?
- Google what is spread, and comment on the value of spread for this data;
- What can you say about the behaviour of ask_size / bid_size?
- Find any interesting linear regression in the data;
- Suppose you are interested in the next change of the midprice. Try to find a correlation with this change and some other feature, whose value is known before the change in midprice happens (thus you might be able to use this feature to predict the future midprice change);
A couple of hints
To help you out just a bit, in case you have never coded a thing in Python, here are a few points that you may find helpful:
- Python is a high-level programming language. For your purposes you need to see it as smth, where you can type a bunch of commands like "download the data", "show the top 5 rows of the data", "plot this part of the data", ... and get an output immediately;
- Python has a bunch of libraries. A library is a collection of pre-written code and functions that extend the capabilities of the Python programming language. Thus, once you import the library in your program (which is easy, just a line of code), you will be able to use some convenient functions that will simplify your life a lot;
- Among the libraries that you need for data analysis are "Pandas", "Numpy", "Matplotlib" and "SciPy". You might already be scared, so we invite you to browse this short and beginner-friendly introduction into all of those;
- The tools of Linear Regression as well as everything that we covered about correlations, covariances and variances could play a part here. That theory is the only mathematical theory you need to dig out something interesting about this data;
- We recommend you use Jupyter Notebook (e.g Google Colab) for your work. It is all you will need, besides the data and your brain;
If you want to have some sort of feedback on your work, please do everything in Jupyter Notebook. You can write text there as well as the code (in Python, which is what you should be using). E.g you can use Google Colab: easily shareable and beginner-friendly Jupyter Notebooks. Also, and I cannot stress this enough, whatever you decide to share should look nice, have clear plots and have readable code. Everything should be labeled and nicely written: feel free to use ChatGPT to help you with that.