HW 2: Write Your Own


We will use the same dataset as the last homework, but will now write our own (basic) neural net so that we can better understand what’s happening under the hood. I created a template file to help you get started, but you’re welcome to discard it and organize it your own way if you prefer. Either way, you’ll need to write your own functions or classes for activation, initialization, loss, layers, network, and optimization. For the activation, loss, layers, and networks, you’ll also need to write the backpropagation routines. I’ve done all the data preparation for you (in the template file). It’s the same stuff we did last time only done manually since we’re just using numpy and not torch (and we’re skipping batching since it’s not really needed on this small dataset). For the optimizer we will just use plain gradient descent. It will work fine in this case, but you’ll likely need to use a far smaller learning rate and will correspondingly need to increase the number of epochs.

Debugging

This homework can be tricky so I’ve provided an example below to help you debug. This example does not represent realistic data or a reasonably sized network, but is just a small example to facilitate checking that your calculations. I set np.random.seed(0) to aid in reproducibility, but just to be sure, I’ve also printed the weights and biases below. In this example I created 2 layers. The first one goes from 2 nodes to 3 nodes. The second one from 3 nodes down to 1 node. Initial weights and biases:

l1.W =  [[0.34710014 0.45232547]
 [0.38122103 0.34461438]
 [0.26794282 0.4084993 ]]
l1.b =  [[0.]
 [0.]
 [0.]]
l2.W =  [[0.30942088 0.63057874 0.68141247]]
l2.b =  [[0.]]

I initialized my first layer with 2 inputs and 10 data points all set to one. I also set my outputs to ones.

X = np.ones((2, ns))
y = np.ones((1, ns))

For those weights and “data”, I get the following MSE loss: 0.02755316457552718 with yhat = array([[1.16599146, ... (repeated 10 times since I made all the data the same)

My derivatives are as follows:

Wbar =  [[0.10272245 0.10272245]
 [0.20934137 0.20934137]
 [0.2262173  0.2262173 ]]
bbar =  [[0.10272245]
 [0.20934137]
 [0.2262173 ]]
Wbar =  [[0.26539565 0.24096496 0.22456723]]
bbar =  [[0.33198292]]

just for completness, the derivative of the last Zbar before the loss function (or we could call it Xbar if you consider the identity function as the last activation) is 0.03319829 repeated 10 times in an array.