HW 5: Neural ODE

due 2/13/2025 before midnight via Learning Suite 25 possible points


We’re going to use a basic neural ODE to try to predict weather data by reproducing results from this blog post. We’re using Python instead of Julia so don’t worry about the code, but the figures and some of the descriptions might be helpful. This dataset has historical data for temperature, humidity, wind speed, and pressure from Delhi, India over a period of a little over four years. Although the data comes pre-split into training/testing, we’ll do our own split (we can do well with less training data since we’re using a Neural ODE rather than a vanilla MLP).

First, we need to do some data prep. Combine all the training and testing data into one dataset. The data provides daily information, but that is pretty noisy so we’re going to make predictions based on months. So you’ll need to average the data within each month. You can do that anyway you like, but the pandas package is one handy way to do these kinds of data operations. As usual, you’ll also want to normalize the data. Then split so that training data corresponds to the first 20 months, and testing to the rest (like as done in the blog post).

You’ll want to do incremental training, like shown in the blog, as opposed to trying to training across all the time-series at once. This type of technique is used in many time or sequence-based problems. If given all the time-series at once the optimizer will often settle towards a solution that just goes through the mean of the data to try to minimize error and will struggle to capture more complex dynamics.

As usual, you’ll need to experiment with the network parameters. I find that the activation function can make a significant difference on this problem so you may need to venture into new activation functions (list of ones provided in PyTorch is here). While not necessary, you might find that you do better with different learning rates or number of epochs for the different stages of training (and by stages I mean stage 1 with the first 4 time points, stage 2 with the first 8 time points, etc.). Either the continuous or discrete adjoint should work (if using the latter, you might want to switch to double precision).

While you certainly don’t need to make animations like the blog post, you will want to plot at the end of each stage. If things aren’t looking good after the first stage, they’re unlikely to improve with more stages, and you’ll want to see that without waiting for all the stages to complete so you can go back and change hyperparameters.