HW · ME 595R

HW 3: Physics Informed Neural Network (PINN)

due 1/30/2025 before midnight via Learning Suite 25 possible points

Reproduce Appendix A.1 of this paper using a physics informed neural net (PINN) to solve Burgers’ equation.

Create a figure similar to Fig A6: the top contour plot (but you don’t need all the x’s marking data locations) and the rightmost graph that shows a slice through the data at t=0.75 (you just need your prediction, don’t need to plot “exact”). I used \(N_u = 100\) and \(N_f = 10,000\). You’ll likely find that you get pretty good prediction but the shock wave isn’t captured as well (rounded instead of sharp). That’s sufficient for the purpose of this assignment, but if you’re interested in improving that, see the advanced tips below the regular tips.

Note that, like we’ve done in the past, separate train/test datasets are important to make sure the model isn’t overfitting. In this case the problem is small, so the number of data and collocation points provides super high coverage everywhere we are making predictions at, so making a separate testing set won’t matter. But for larger problems you should definitely have a test set.

A few tips:

Use a tanh activation function like noted in the paper (or some other continuously differentiable activation function) since we need second derivatives of the neural net.
When computing derivatives with torch.autograd.grad you will need to use the grad outputs option. We have vectors x and t going in, and vector u coming out, where each element of the vectors corresponds to a different data sample. In other words, dui/dxi and dui/dti are independent of every other index i. To compute all these derivatives in one shot, pass a vector of ones in the grad outputs option (i.e., grad outputs = torch.ones_like(x)). This is called the “seed” for algorithmic differentiation.
You also need to set create_graph=True in the call to torch.autograd.grad since we will need to backpropgate through these derivatives (i.e., compute derivatives of derivatives)
It’s fine to just use Adam for this assignment, even though they mention LBFGS (see advanced tips if interested in the latter).
Latin hypercube sampling is helpful for the collocation points to get good coverage. You can use from scipy.stats import qmc. Though I’m sure you could do fine for this small problem with just regular random sampling or even sampling. Either way, be sure that these sampling points stay fixed during the training.
As with any neural net problem, I’d recommend starting with a smaller number of layers, collocation points, and epochs until things seem to be working properly, then scale up.

Optional advanced tips if you want to really capture that shock:

Change everything to double precision. There are large derivatives near the shock and the accuracy isn’t good enough with single precision. For any torch tensor you create you need to set dtype=torch.float64 and for the network you need to change all its weights and biases to double precision also: model.double() where model is your instantiated network.
Use the LBFGS optimizer with the strong wolfe line search option (line_search_fn="strong_wolfe"). In the optimization world, we always use second-order methods like BFGS. But they are not compatible with minibatching and so the DL world almost always uses first-order methods. In this case we don’t have tons of data, so we don’t need minibatching, and the second-order optimizer will do much better. It will be much slower per epoch, but you’ll also need way less epochs. LBFGS with a line search is setup to work differently and you will need to create a closure function when you call optimizer.step(closure). It’s essentially the same as the train function. Search online or use AI chatbots for examples. In this case you’ll want to set optimizer.zero_grad() at the beginning of the closure function. Adam and all the other optimizers work with closure functions too, they just don’t require it.