Saving and Loading model is one of the key components of building Deep Learning Solutions. Not only they are used in model deployments, but also in Transfer Learning. The already trained model from Millions or Billions of records can be saved and used by others who want to just deploy and use the model or do not have access to huge amount or training data.

The graph generated in a session in Tensorflow can be vizualized using a Tensorboard which generates the Graph model defined in the code in a UI. The standard way is to save the graph on disk in a file, and then load the file via the tensorboard command which runs Tensorboard on 6006 port (which looks like g00gle)

# Tensorflow Glossary Part 3 - Loss functions

Loss functions are the key to optimizing any machine learning algorithm in Tensorflow. It is important to select the right loss function for any machine learning problem which is then fed into the different optimizer functions.

# Tensorflow Glossary Part 2 - Optimizer Functions

A short list and details of the most commonly used optimizer functions used in TensorFlow. Not all functions are listed here.

# Tensorflow Glossary Part 1 - Mathematical Functions

This is a summary and brief write up of common (and not so common mathematical functions in Tensorflow) - just a few lines specifying the syntax and how they should be written in TensorFlow.

# Building a Classifier in Tensorflow

Classification algorithm using TensorFlow - Application of a 4 layered Neural Network Architecture to solve the Sonar Mines & Rocks dataset classification. The program is at this location on Github

# Understanding Logistic Regression Output from SAS

This post details the terms obtained in SAS output for logistic regression. The definitions are generic and referenced from other great posts on this topic. The aim is to provide a summary of definitions and statistical explaination of the output obtained from Logistic Regression Code in SAS.

# Anamoly Detection Algorithms

Anamoly Detection is a class of semi-supervised (close to unsupervised) learning algorithm widely used in Manufacturing, data centres, fraud detection and as the name implies, anamoly detection. Normally this is used when we have a imbalanced classification problem, with, say, y=1(anamoly) is approx 20 and y=0 is 10,000. An example would be identifying faulty aircraft engines based on a wide number of parameters, where the anamolous data might not be available or if it is available, will be less than 0.1%.

# ROC-AUC explained

ROC or receiver operating characteristic curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. Essentially it illustrates the ability of the classifier to segregate the classses. A higher AUC (Area under the curve)-ROC denotes a better classifier

# Similarity functions in Python

Similarity functions are used to measure the ‘distance’ between two vectors or numbers or pairs. Its a measure of how similar the two objects being measured are. The two objects are deemed to be similar if the distance between them is small, and vice-versa.

# TF-IDF for NLP

Term frequency–inverse document frequency (TF-IDF), is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The TF-IDF value increases proportionally to the number of times a word appears in the document and is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.

# LSTM - Echo Sequence Prediction Problem (Vanilla LSTM)

This is an implementation of basic and simple LSTM implementation (also called the vanilla LSTM) in Keras. This is a two layered model with simple LSTM in one layer and a final Dense layer. The LSTM model is for Echo Sequence Prediction.

# Statistics Primer- Recalling all the frequently used terms

This post is just to keep as a ready reference mterial for some statistical terms which come up a lot in maching learning. This will be expanded from time to time to keep up the relevant data anad material.

# Time Series Modeling - Part I (Theoretical Background)

Apart from clssification and regression problems, Time Series models are a separte entity in itself which are not easily tackled by standard methods and algorithms (well, it can be after some smart tweaks). The main aim of a time series analysis is to forecast future values of a variable using its past values. Time series models are also very business friendly, and directly solve some business problems like “What will be my stores sales in the nest two months” or “How many customers are going to come in my pizz store tomorrow, so that I can optimize my ingredients”

# Handwritten digit recognition in MNIST

Handwritten digit recognition using MNIST data is the absolute first for anyone starting with CNN/Keras/Tensorflow. It is a well defined problem with a standardizd dataset, though not complex, which can be used to run deep learning models as well as other machine learning models (logistic regression or xgboost or random forest) to predict the digits.

# GloVe word vectors

GloVe stands for Global Vectors for word representation. Previously we were picking up context (c) and target(t) in the window randomly. GloVe makes this selection explicit.

# Learning Word Embeddings with Neural Language Model

How to learn the Embedding matrix for a task? For example, the task is to predict the next word in the sequence “I want a glaass of orange ????”. Building a neural language model is one of the ways to learn word embeddings.

# Word2Vec Algorithm- Skip-gram and Continuous Bag of Words

Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

# Word Embeddings & Embedding Matrix

Word Embeddings are the core of applying the RNNs to Natural Language Processing Tasks. Embedding are used to convert words and sentences to ‘numbers’ which the computer can not only understand, but also use them for NLP tasks such as Man:Woman = King:? -> Queen.

# Bidirectional RNNs and Deep RNNs

The typical RNN model works in a way such that the past sequences affect the next sequence, while in reality a particular output could get influenced by both the sequences before it and sequences after it. BiDirectional RNNs (or BRNN) take into account this effect in its architecture

# Long Short Term Memory Networks (LSTM)

In LSTM, as compared to GRUs, $\Gamma_r$ is not used. In place of $\Gamma_r$, two separate gates $\Gamma_u$ and $\Gamma_f$ (forget gate) are used. Also, $a^{\langle t \rangle} \ne c^{\langle t \rangle}$.

# Gated Recurrent Units (GRUs)

Gated Recurrent Units (GRUs) are a form of RNN which can capture long range dependencies in a sequential data.

The sequence input to RNNs can be really long, and its quite possible that the inputs in the beginning on the sequence will decide the output units sometime later, by when the gradients will not be strong enough to affect the output.

# Language Models and Sequence Generation

A language model estimates the probability of a sentence (sequence or words) occuring together, and provides a comparison between different possible combination or variants of similar sentence.

# Variations of Recurrent Neural Network

There are various variations of RNN which depends on the type of input data provided and the type of output required, and also on the problem we are trying to solve using Recurrent Neural Networks. Each variation has a different RNN architecture and the implementation depends on this architecture.

# Backpropagation through time in a RNN

Backpropagation in a RNN is required to calculate the derivates of all the different parameters for optimization function using Gradient Descent. The gradient is propagated back in the network across all layers and instances <1>,<2>,… and for every step of forward propagation as shown in the figure below:

# Recurrent Neural Network (RNN) - Forward Propagation

The standard neural networks cannot take into account the sequence that come before or after a data point. For example, to identify a name in a sentence, we need knowledge of the other words surrounding it to identify it. In the belowmentioned senteces, in (1) ‘Teddy’ refers to a name, while it refers to a toy in (2).

# Sequence Models - Recurrent Neural Networks (RNN)

The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps (more on this later)

# Text Mining in Python

Text mining and analysis is one of the most widely used implementation of data science and deep learning. The challenge in performing text mining stems from the fact that humans and computers perceive text data differently. While ahuman can figure out the context easily, it is not so easy for the computers. Also, an algorithm sees a corpora of text as a matrix of numbers, while we see it through the eyes of ‘language’ structure. There are many more such differences, but what the latest deep learning algorithms have been able to do is simply astonishing !!

# Xtreme Gradient Boosting Algorithm (XGBoost)

XGBoost algorithm belongs to the family of Boosting algorithms which provide better and faster results than the traditional classification/regression algorithms. They are widely being used as the go-to algorithms for a lot of Machine Learning Tasks.

# Datetime operations in Python

Some of the key codes for date time manipulation using datetime library of pandas. Useful for time series problems and doing feature engineering based on dates.

# LightGBM Algorithm & comparison with XGBoost

Light GBM is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithm, used for ranking, classification and many other machine learning tasks.

# Clustering Methodologies

Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

# Random Forests

Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.

# Model Evaluation Metrices

Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate. The choice of metric completely depends on the type of model and the implementation plan of the model.

# Ensemble Methods, Bagging and Boosting

Ensemble methods involve group of predictive models to achieve a better accuracy and model stability. Ensemble methods are known to impart supreme boost to tree based models.

# How does a Decision Tree decide where to split?

There are various ways to decide on the metric to choose the variable on which splitting for a node is done. Different algorithms deploy different metrices to decide which variable splits the dataset the the best way.

# Decision Trees - Optimization

Overfitting is one of the key challenges faced while modeling decision trees. If there is no limit set of a decision tree, it will give you 100% accuracy on training set because in the worse case it will end up making 1 leaf for each observation. Thus, preventing overfitting is pivotal while modeling a decision tree and it can be done in 2 ways:

# Data Vizualization with Python

Data Vizualization is a part of Exploratory data analysis - charts and graphs can tell you much more than what a simple table or a bunch of numbers tell you.

# Exploratory Data Analysis for datasets in Python

Exploratory Data Analysis or EDA is the most impostant part of any project or code related to data, as it helps you to understand more about the data before arriving at any hypothesis.

# Data Preprocessing in Python

There are various ways to preprocess the data after the basic exploratory analysis with data - mostly to convert the data to fit into the model.

# Neural Style Transfer - Art Generation using Neural networks

A really cool implementation of CNN is the Neural Style Transfer for Art Generation. It basically merges two images - one Content image and other Style image to create a new image which is a combination of the two.

# Facial Recognition & Verification using Convolutional Neural Network

Understanding the algorithm behind the Facial Recognition & Facial Verification technologies and the associated loss functions and technical details. I will also be building a code from scratch (will be posted separately - this post is mostly algorithms and mathematics) for Face Recognition using CNN

# Bounding Box Predictions, Intersection Over Union and Non Max Supression

Boundary Box Prediction

# Convolution Neural Network - Overview

Convolution Neural Networks are a class of neural network which take into account not just the vector of inputs, but also takes into account the spatial arrangement of data. An example would be a transactional data which can be analyzed using a tradition neural network where the spatial arrangement or order of inputs does not mattr, as compared to images which are almost exclusively analyze using CNN where the arrangement of pixels around each other is of paramount importance (infact this is exactly what makes up an image).

# CNN - Residual Networks

A lot of researchers have done great research in proposing and formulating different architectures for CNN which have been proven in different scenarios with different datasets. All of them are constantly evolving, with later ones performing better than the older versions due to novel techniques and emerging algorithms.

# CNN Filters, Pooling, Padding, and Strides

There are multiple building blocks in a CNN architecture. With each operation (Fiters, pooling, convolution etc), the dimension of output matrix changes. It is extremely important to keep track of matrix dimensions to make sure the calculations are done in a correct way

# Network in Network - 1x1 Convolutions

Network in Network or more commonly known 1x1 Convolution are used to manipulate the depth of input channels

# Hyperparameter Tuning

HYperparameter tuning in Deep Learning: Learning rate $\alpha$, $\beta , \beta_1, \beta_2, \epsilon$, number of layers, number of hidden units, learning rate decay, mini-batch size

# Batch Normalization Algorithm

Batch normalization makes the Hyperparameter tuning easier and makes the neural network more robust. Also enables to train a big network easier and faster.

When the derivatives/slope from becoming too big or too small

# Optimization Algorithms - RMSprop and Adam

Gradient Descent is widely used as an optimization algorithm for optimizing the cost functions. There are various improved version of these algorithms like StochasticGradientDescent, Gradient Descent with Momentum, RMSprop and Adam.

How to make sure the implementation of Backpropagation is correct?

# Activations Functions in a Neural Network

Activation functions are required in every neuron unit of a neural network layer to convert the parametrized equation ($W^Tx+b$)to a number which can be fed into the next layer. The backpropagation algorithm also uses the derivatives of these activation functions to propagate the error back into the networks, and minimize the final cost.