Learning Word Embeddings with Neural Language Model

How to learn the Embedding matrix for a task? For example, the task is to predict the next word in the sequence “I want a glaass of orange ????”. Building a neural language model is one of the ways to learn word embeddings.

Building a neural network to predict the next word in the sequence: For each word, there is a one hot vector $O_w$ which is multiplied with an Embedding Matrix $E$ to generate an embedding vector $e_w$. Each of the embedding vector is a 300 dimensional vector (assuming E is 300x10000). All the embedding vectors are then fed to a neural network which further feeds into a softmax layer to output proabbilities for each word as the next word.

Test Image

Since we are using 300 dimensional embedding vectors and in the example above we have 6 words, there will a total of 1800 dimensional input to neural network. Normally, only a small window of say, 4 words is taken to predict the next word. This reduces the final number from 1800 to 1200 in this case

Generalizing this algorithm to derive enev simpler ones:

Other context / target pairs:

context: last 4 words, or 4 words to the left and to the right, or last 1 word, or nearby 1 word (skip gram model)

Written on February 19, 2018
[ ]