Long Short Term Memory Networks (LSTM)

In LSTM, as compared to GRUs, $\Gamma_r$ is not used. In place of $\Gamma_r$, two separate gates $\Gamma_u$ and $\Gamma_f$ (forget gate) are used. Also, $a^{\langle t \rangle} \ne c^{\langle t \rangle} $.

Test Image

Equations for LSTM:

(1) $\tilde c^{\langle t \rangle} $ = tanh($W_c[a^{\langle t-1 \rangle},x^{\langle t \rangle}] + b_c)$

(2) Update Gate: $\Gamma_u = \sigma(W_u[a^{\langle t-1 \rangle},x^{\langle t \rangle}]+b_u)$

(3) Forget Gate: $\Gamma_f = \sigma(W_u[a^{\langle t-1 \rangle},x^{\langle t \rangle}]+b_f)$

(4) Output Gate: $\Gamma_o = \sigma(W_o[a^{\langle t-1 \rangle},x^{\langle t \rangle}]+b_o)$

(5) $c^{\langle t \rangle}$ = $\Gamma_u * \tilde c^{\langle t \rangle} + \Gamma_f * c^{\langle t-1 \rangle}$

(6) $a^{\langle t \rangle}$ = $\Gamma_o*c^{\langle t \rangle}$

The equation (5) gives the option of keeping old value ($\Gamma_f$) and add new value to it.

$a^{\langle t-1 \rangle}$ and $x^{\langle t \rangle}$ are used to compute all gates ($\Gamma_u$, $\Gamma_f$ & $\Gamma_o$).

Peephole Connections: $a^{\langle t-1 \rangle}$ , $x^{\langle t \rangle}$ and $c^{\langle t-1 \rangle}$ are used to compute $\Gamma_u$, $\Gamma_f$ & $\Gamma_o$. This is knows as Peephole Connections.

Test Image

Written on February 17, 2018