Anatomy of a Tensorflow Program - Part 1 (basics)

This post is to dissect into a basic Tensorflow code - highlighting the typical structure of a Deep Learning Algorithm implemented using TensorFlow.

This is not intended as a detailed tutorial explaining the functions but to explain the structure of a sample code solving an easy problem- for details on the components see the following posts:
TensorFlow Mathematical Functions
TensorFlow Optimizer Functions
TensorFlow Loss Functions

Problem Statement: The “Hello World” of Deep Learning - MNIST Classification

The dataset is availavble on many forums, although keras.datasets provide a very easy way to download and read the data.
The code is divided into three parts:
Code Part 1: Basic Neural Network in TensorFlow
Code Part 2: Convolutional Neural Network in TensorFlow
Code Part 3: Convolutional Neural Network in Keras

The Entire code is pasted at the Github location here

Code Part 1: Basic Neural Network in TensorFlow

Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

Data Preparation

Import MNIST dataset from Keras.datasets

from keras.datasets import mnist
(x_train, y_train), (x_test,y_test) = mnist.load_data()

Vizualize 1 data and label

random_index = 3989 #Just a random number
print("Y Label : ",y_train[random_index])
plt.imshow(x_train[random_index], cmap="Greys")

### Data Reshaping

Data Reshaping needs to be done to feed the data into Tensorflow and Keras. The built in functions expect data in a specific format.

Each of the image is a 28x28 pixel matrix. For feeding it into a Neural netwrok, they need to be converted to a vector with 28x28 = 784 elements into it. It is essentially flattening a 2D matrix into a 1D Vector.

The basic Neural Network will just be a 2D matrix, where each row will be 1 example from the MNIST example, and there will be 784 elements. When data is loaded into a Convolutional Neural Network, this step is not needed, altough you will need to convert it as per the function’s requirement. IN Tensorflow, this is [Number of Examples, number of elements in x direction , number of elements in y direction,number of channels (3 in case of RGB)]. In PyTorch the channels come first. The CNNs will be covered in greater detail in the next section below this code

print("Training Data Shape :", x_train.shape) #Output: (60000,28,28)

Reshaping Data (X) into 60,000 Vectors of 784 dimension (28x28)

x_train = x_train.reshape([x_train.shape,x_train.shape*x_train.shape])
x_test = x_test.reshape([x_test.shape,x_test.shape*x_test.shape])

Normalizing the data: Getting all data points between 0 and 1 from 0 and 255. Also conversion of the data to float32

x_train = x_train.astype('float32')/255
x_test = x_test.astype('float32')/255

Reshape Data (Y) into one hot encoding.

print("Y Labels Shape: " ,y_train.shape) #Output : (60000,)

The labels are saved as numeric digits from 0 to 9 for each of the classes. This needs to be convereted to one hot encoded vectors of (60000,10) dimensions where each column vector denote 1 class and will be tagged as 1 whereever the class is encountered.

The function np_utils.to_categorical(y_train) is an easy way to convert to classes. Alternately, this can be done using pd.get_dummies() function

from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

Code Part 1: Implementation of a Basic Neural Network in TensorFlow

TF Model parameters

Learning Rate: Rate at which model converges ($\alpha$ in Gradient Descent Algorithm)
Epochs: The number of times all examples are fed into the optimization algorithm Batch Size: Number of examples the model reads in a batch. Read Mini Batch Gradient Descent Display Step: For Vizualization purposes

learning_rate = 0.1
epochs = 500
batch_size = 512
display_step = 10

Build Tensorflow Graph and define optimizing function

Define variables to store the key numbers num_samples: Number of samples/rows in training dataset num_features: Number of features (columns) in training data (=784) num_classes: Number of classes in labels (=10)

num_samples = x_train.shape
num_features = x_train.shape
num_classes = y_train.shape

Placeholders for X and Y

Placeholders are defined in a Tensorflow graph to tell the graph that the data is expected in this place. The idea is to build a graph first, and then feed in the data while executing.
Placeholders need to be defined with data types and dimension in which the data is expected

x = tf.placeholder(tf.float32, [None,num_features])
y = tf.placeholder(tf.float32, [None, num_classes])

Model Parameters : Weight and Biases

tf.Variable is used to define the values which can change during the model run. If a variable is declared as tf.Constant its valuye cannot be changed. However, tf.Variable gives the optimizer to change and iterates these values to minimize the loss function defines later in this code

W = tf.Variable(tf.zeros([num_features,num_classes])) #Parameter matrix for Weight (784,10)
b = tf.Variable(tf.zeros([num_classes])) #Parameter vecto for Bias (10,)

Define the model : Basic Logistic Regression Model

pred = tf.nn.softmax(tf.matmul(x,W)+b)

Define cost: This is the standard logistic regression cost function which needs to be minimized using appropriate optimization Algorithm.

y: actual labels
pred: Predictons via softmax

cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices =1))

Optimize cost

GradientDescent OPtimizer is used here to optimize the cost function (loss)

Running Tensorflow Graph

The tf.Variables need to be initialized at random in the beginning. There are various initialization functions

init = tf.global_variables_initializer()

Providing Entire Dataset in 1 epoch
tf.Session() defines a new graph and assigns it to a session object ‘sess’

with tf.Session() as sess:
#Run Initializer: First step
sess.run(init)

for epoch in range(epochs):
# Train without mini batches : Provide all training data at a single go
_,loss = sess.run([optimizer,cost], feed_dict= {x:x_train,y:y_train})
print(loss)

#Calculate accuracy on test data
correct_prediction = tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

print("Accuracy : ",accuracy.eval({x:x_test, y:y_test}))
Written on September 22, 2018
]