Transpose Convolution to reverse Convolution operation

Traditional convolutional layer takes a patch of an image and produces a number (patch -> number). In “transpose convolution” we want to take a number and produce a patch of an image (number -> patch). We need this layer to “undo” convolutions in encoder. This is used specifically for a decoder type operation.

Read More

Deep Learning Models using Keras Functional API

Sequential API and Functional APIs are the two primary ways in which a Deep Learning Model can be built in Keras. Squential API allows you to build a model step by step, layer by layer. However, it does not allow your to make a model which has multiple input or multiple output. This is something which the Keras Functional API can handle

Read More

Saving and Loading Models in TensorFlow & Keras

Saving and Loading model is one of the key components of building Deep Learning Solutions. Not only they are used in model deployments, but also in Transfer Learning. The already trained model from Millions or Billions of records can be saved and used by others who want to just deploy and use the model or do not have access to huge amount or training data.

Read More

Visualizing Tensorflow Graph and Saving/Loading Models

The graph generated in a session in Tensorflow can be vizualized using a Tensorboard which generates the Graph model defined in the code in a UI. The standard way is to save the graph on disk in a file, and then load the file via the tensorboard command which runs Tensorboard on 6006 port (which looks like g00gle)

Read More

Understanding Logistic Regression Output from SAS

This post details the terms obtained in SAS output for logistic regression. The definitions are generic and referenced from other great posts on this topic. The aim is to provide a summary of definitions and statistical explaination of the output obtained from Logistic Regression Code in SAS.

Read More

Anamoly Detection Algorithms

Anamoly Detection is a class of semi-supervised (close to unsupervised) learning algorithm widely used in Manufacturing, data centres, fraud detection and as the name implies, anamoly detection. Normally this is used when we have a imbalanced classification problem, with, say, y=1(anamoly) is approx 20 and y=0 is 10,000. An example would be identifying faulty aircraft engines based on a wide number of parameters, where the anamolous data might not be available or if it is available, will be less than 0.1%.

Read More

ROC-AUC explained

ROC or receiver operating characteristic curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. Essentially it illustrates the ability of the classifier to segregate the classses. A higher AUC (Area under the curve)-ROC denotes a better classifier

Read More

Similarity functions in Python

Similarity functions are used to measure the ‘distance’ between two vectors or numbers or pairs. Its a measure of how similar the two objects being measured are. The two objects are deemed to be similar if the distance between them is small, and vice-versa.

Read More

TF-IDF for NLP

Term frequency–inverse document frequency (TF-IDF), is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The TF-IDF value increases proportionally to the number of times a word appears in the document and is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.

Read More

Time Series Modeling - Part I (Theoretical Background)

Apart from clssification and regression problems, Time Series models are a separte entity in itself which are not easily tackled by standard methods and algorithms (well, it can be after some smart tweaks). The main aim of a time series analysis is to forecast future values of a variable using its past values. Time series models are also very business friendly, and directly solve some business problems like “What will be my stores sales in the nest two months” or “How many customers are going to come in my pizz store tomorrow, so that I can optimize my ingredients”

Read More

Handwritten digit recognition in MNIST

Handwritten digit recognition using MNIST data is the absolute first for anyone starting with CNN/Keras/Tensorflow. It is a well defined problem with a standardizd dataset, though not complex, which can be used to run deep learning models as well as other machine learning models (logistic regression or xgboost or random forest) to predict the digits.

Read More

GloVe word vectors

GloVe stands for Global Vectors for word representation. Previously we were picking up context (c) and target(t) in the window randomly. GloVe makes this selection explicit.

Read More

Word2Vec Algorithm- Skip-gram and Continuous Bag of Words

Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

Read More

Word Embeddings & Embedding Matrix

Word Embeddings are the core of applying the RNNs to Natural Language Processing Tasks. Embedding are used to convert words and sentences to ‘numbers’ which the computer can not only understand, but also use them for NLP tasks such as Man:Woman = King:? -> Queen.

Read More

Bidirectional RNNs and Deep RNNs

The typical RNN model works in a way such that the past sequences affect the next sequence, while in reality a particular output could get influenced by both the sequences before it and sequences after it. BiDirectional RNNs (or BRNN) take into account this effect in its architecture

Read More

Long Short Term Memory Networks (LSTM)

In LSTM, as compared to GRUs, $\Gamma_r$ is not used. In place of $\Gamma_r$, two separate gates $\Gamma_u$ and $\Gamma_f$ (forget gate) are used. Also, $a^{\langle t \rangle} \ne c^{\langle t \rangle} $.

Read More

Vanishing Gradients with RNN

The sequence input to RNNs can be really long, and its quite possible that the inputs in the beginning on the sequence will decide the output units sometime later, by when the gradients will not be strong enough to affect the output.

Read More

Language Models and Sequence Generation

A language model estimates the probability of a sentence (sequence or words) occuring together, and provides a comparison between different possible combination or variants of similar sentence.

Read More

Variations of Recurrent Neural Network

There are various variations of RNN which depends on the type of input data provided and the type of output required, and also on the problem we are trying to solve using Recurrent Neural Networks. Each variation has a different RNN architecture and the implementation depends on this architecture.

Read More

Backpropagation through time in a RNN

Backpropagation in a RNN is required to calculate the derivates of all the different parameters for optimization function using Gradient Descent. The gradient is propagated back in the network across all layers and instances <1>,<2>,… and for every step of forward propagation as shown in the figure below:

Read More

Recurrent Neural Network (RNN) - Forward Propagation

The standard neural networks cannot take into account the sequence that come before or after a data point. For example, to identify a name in a sentence, we need knowledge of the other words surrounding it to identify it. In the belowmentioned senteces, in (1) ‘Teddy’ refers to a name, while it refers to a toy in (2).

Read More

Sequence Models - Recurrent Neural Networks (RNN)

The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps (more on this later)

Read More

Text Mining in Python

Text mining and analysis is one of the most widely used implementation of data science and deep learning. The challenge in performing text mining stems from the fact that humans and computers perceive text data differently. While ahuman can figure out the context easily, it is not so easy for the computers. Also, an algorithm sees a corpora of text as a matrix of numbers, while we see it through the eyes of ‘language’ structure. There are many more such differences, but what the latest deep learning algorithms have been able to do is simply astonishing !!

Read More

Xtreme Gradient Boosting Algorithm (XGBoost)

XGBoost algorithm belongs to the family of Boosting algorithms which provide better and faster results than the traditional classification/regression algorithms. They are widely being used as the go-to algorithms for a lot of Machine Learning Tasks.

Read More

Datetime operations in Python

Some of the key codes for date time manipulation using datetime library of pandas. Useful for time series problems and doing feature engineering based on dates.

Read More

Clustering Methodologies

Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

Read More

Random Forests

Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.

Read More

Model Evaluation Metrices

Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate. The choice of metric completely depends on the type of model and the implementation plan of the model.

Read More

Decision Trees - Optimization

Overfitting is one of the key challenges faced while modeling decision trees. If there is no limit set of a decision tree, it will give you 100% accuracy on training set because in the worse case it will end up making 1 leaf for each observation. Thus, preventing overfitting is pivotal while modeling a decision tree and it can be done in 2 ways:

Read More

Data Vizualization with Python

Data Vizualization is a part of Exploratory data analysis - charts and graphs can tell you much more than what a simple table or a bunch of numbers tell you.

Read More

Data Preprocessing in Python

There are various ways to preprocess the data after the basic exploratory analysis with data - mostly to convert the data to fit into the model.

Read More

Convolution Neural Network - Overview

Convolution Neural Networks are a class of neural network which take into account not just the vector of inputs, but also takes into account the spatial arrangement of data. An example would be a transactional data which can be analyzed using a tradition neural network where the spatial arrangement or order of inputs does not mattr, as compared to images which are almost exclusively analyze using CNN where the arrangement of pixels around each other is of paramount importance (infact this is exactly what makes up an image).

Read More

CNN - Residual Networks

A lot of researchers have done great research in proposing and formulating different architectures for CNN which have been proven in different scenarios with different datasets. All of them are constantly evolving, with later ones performing better than the older versions due to novel techniques and emerging algorithms.

Read More

CNN Filters, Pooling, Padding, and Strides

There are multiple building blocks in a CNN architecture. With each operation (Fiters, pooling, convolution etc), the dimension of output matrix changes. It is extremely important to keep track of matrix dimensions to make sure the calculations are done in a correct way

Read More

Hyperparameter Tuning

HYperparameter tuning in Deep Learning: Learning rate $\alpha$, $\beta , \beta_1, \beta_2, \epsilon$, number of layers, number of hidden units, learning rate decay, mini-batch size

Read More

Batch Normalization Algorithm

Batch normalization makes the Hyperparameter tuning easier and makes the neural network more robust. Also enables to train a big network easier and faster.

Read More

Optimization Algorithms - RMSprop and Adam

Gradient Descent is widely used as an optimization algorithm for optimizing the cost functions. There are various improved version of these algorithms like StochasticGradientDescent, Gradient Descent with Momentum, RMSprop and Adam.

Read More

Activations Functions in a Neural Network

Activation functions are required in every neuron unit of a neural network layer to convert the parametrized equation ()to a number which can be fed into the next layer. The backpropagation algorithm also uses the derivatives of these activation functions to propagate the error back into the networks, and minimize the final cost.

Read More