Sequence Models - Recurrent Neural Networks (RNN)

The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps (more on this later)

Example of Sequences:

SequenceExamples

Example of Specific Sequence : Inputs and Outputs to RNN

Output: Named Entity Recognintion: Identify the named entities in a sentence

For example take the sentence “Harry Potter and Hermione Granger invented a new spell.”

Named Entities: “Harry Potter”, “Hermione Granger”

X = ““Harry Potter and Hermione Granger invented a new spell.”

$x^{<1>}$= Harry
$x^{<2>}$= Potter
$x^{<3>}$= and
$x^{<4>}$= Hermione
$x^{<5>}$= Granger
$x^{<6>}$= invented
$x^{<7>}$= a
$x^{<8>}$= new
$x^{<9>}$= spell

Y :
$y^{<1>}$=1
$y^{<2>}$=1
$y^{<3>}$=0
$y^{<4>}$=1
$y^{<5>}$=1
$y^{<6>}$=0
$y^{<7>}$=0
$y^{<8>}$=0
$y^{<9>}$=0

Length of input sequence $T_x$ is same as output sequence $T_y$

For multiple examples (trainng set) $x^{(i)}$ represents the value of $t^{th}$ word in $i^{th}$ training example

Length of input sequence $T_x^{(i)}$=9 (in the case above)
Length of output sequence $T_y^{(i)}$=9 (in the case above)


Voabulary: A vocabulary is a set of all the words in the english language in alphabetical order, and an index is assigned to each of the word. For example, the first word will be “a” with index 1, second word “aaron” with index 2, and son on. The last word would be “zulu” with index, say, 10000. An unknown word will be stored in the vocabulary as with index 10001.

One Hot Vector of input features:
Each word in the training example will be stored as a One-hot vector representation. For example, $x^{<1>}$ which is the word “Harry” having an index of, say, 407 in the Vocabulary array, will be represented as a one hot vector with 1 at the 407th place and 0 elsewhere. There will be 1 column and 10000 rows (size of vocabulary)

$x^{<1>}$ = $\left[ \eqalign{0\cr 0\cr 0\cr .\cr .\cr 1\cr .\cr .\cr .\cr 0\cr 0\cr .} \right]$

$x^{<2>}$ = $\left[ \eqalign{0\cr 0\cr 0\cr .\cr .\cr .\cr .\cr .\cr 1\cr 0\cr 0\cr .} \right]$


References:

  1. http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
  2. AndrewNG Coursera Course

Disclaimer: some images/material might be copied directly from some sources.

Written on February 14, 2018
[ ]