# Backpropagation through time in a RNN

Backpropagation in a RNN is required to calculate the derivates of all the different parameters for optimization function using Gradient Descent. The gradient is propagated back in the network across all layers and instances <1>,<2>,…

The loss function is calculated for each individual RNN cell <1>,<2>,…

**Loss Function for a time step < t >:**

$L^{\langle t \rangle}(\hat y^{\langle t \rangle}, y^{\langle t \rangle})$ = $-y^{\langle t \rangle} log(\hat y^{\langle t \rangle}) - (1-y^{\langle t \rangle}) log (1-\hat y^{\langle t \rangle})$

Total Cost Function: Summing over the loss function from 1..t:

$L(y,\hat y)$ = $\sum_{t=1}^t (\hat y^{\langle t \rangle}, y^{\langle t \rangle})$