Learning Word Embeddings with Neural Language Model
How to learn the Embedding matrix for a task? For example, the task is to predict the next word in the sequence “I want a glaass of orange ????”. Building a neural language model is one of the ways to learn word embeddings.
Building a neural network to predict the next word in the sequence: For each word, there is a one hot vector $O_w$ which is multiplied with an Embedding Matrix $E$ to generate an embedding vector $e_w$. Each of the embedding vector is a 300 dimensional vector (assuming E is 300x10000). All the embedding vectors are then fed to a neural network which further feeds into a softmax layer to output proabbilities for each word as the next word.
Since we are using 300 dimensional embedding vectors and in the example above we have 6 words, there will a total of 1800 dimensional input to neural network. Normally, only a small window of say, 4 words is taken to predict the next word. This reduces the final number from 1800 to 1200 in this case
Generalizing this algorithm to derive enev simpler ones:
Other context / target pairs:
context: last 4 words, or 4 words to the left and to the right, or last 1 word, or nearby 1 word (skip gram model)
Source material from Andrew NG’s awesome course on Coursera. The material in the video has been written in a text form so that anyone who wishes to revise a certain topic can go through this without going through the entire video lectures.