Text Mining in Python

Text mining and analysis is one of the most widely used implementation of data science and deep learning. The challenge in performing text mining stems from the fact that humans and computers perceive text data differently. While ahuman can figure out the context easily, it is not so easy for the computers. Also, an algorithm sees a corpora of text as a matrix of numbers, while we see it through the eyes of ‘language’ structure. There are many more such differences, but what the latest deep learning algorithms have been able to do is simply astonishing !!

The most famous example is the algorithm’s ability to decipher and create meaning from words and mathematical operators. When the algorithm was given the string King - Man + Woman the answer was Queen. This is the algorithm figuring out the context and the meaning of mathematical operators applied to word vectors.

Text Mining Topics: Theoretical and Mathematics

There are traditional approaches and the new-age deep learning, approaches, but it is important to understand both of them to generate a ‘feel’ of the text data. The topics that will be covered in later blogs are:


  1. NLTK & Corpora
  2. Lemmatization, Stemming
  3. TF-IDF
  4. Document-Term Matrix (DTM) and TDM
  5. POS Tagging - Named Entity Recognition
  6. PCA and SVD
  7. Article Spinner
  8. NGram Tagger
  9. Chunking and Chinking
  10. Latent Scemantic Analysis


  1. Named Entity Detection
  2. Sentiment Analysis
  3. Document Classifier


  1. Word Embeddings
  2. t-SNE dimension reduction for Text Data
  3. Word Embeddings
  4. CBOW
  5. Skip-Gram
  6. Wod2Vec, GLoVe

Deep Learning

  1. RNN


  1. Automated Text Generation using GANs

I’ll cover all these topics one by one during the month of February 2018

Written on February 1, 2018
[ ]