Backpropagation through time in a RNN

Backpropagation in a RNN is required to calculate the derivates of all the different parameters for optimization function using Gradient Descent. The gradient is propagated back in the network across all layers and instances <1>,<2>,… and for every step of forward propagation as shown in the figure below:


The loss function is calculated for each individual RNN cell <1>,<2>,…, and to calculate total loss it is summed over all the instances from 1...t.

Loss Function for a time step < t >:

$L^{\langle t \rangle}(\hat y^{\langle t \rangle}, y^{\langle t \rangle})$ = $-y^{\langle t \rangle} log(\hat y^{\langle t \rangle}) - (1-y^{\langle t \rangle}) log (1-\hat y^{\langle t \rangle})$

Total Cost Function: Summing over the loss function from 1..t:

$L(y,\hat y)$ = $\sum_{t=1}^t (\hat y^{\langle t \rangle}, y^{\langle t \rangle})$

Written on February 14, 2018