Handwritten digit recognition in MNIST
Handwritten digit recognition using MNIST data is the absolute first for anyone starting with CNN/Keras/Tensorflow. It is a well defined problem with a standardizd dataset, though not complex, which can be used to run deep learning models as well as other machine learning models (logistic regression or xgboost or random forest) to predict the digits.
In this example, I’ll be going through two implementations of MNIST classification, one using a simple NN and another using a convolutional NN.
The MNIST dataset is available for download at many locations, including this one. The dataset also comes preloaded with Keras and Tensorflow implementations.
The code for this is provided at this location on Github, comments are inserted to make it understandable. I’m just explaining here some of the key parts of the code.
# Parameters for the model batch_size=128 #output classes - 10 in this case num_classes=10 epochs =12
Batch Size: This is for the Batch Gradient descent algorithm where we take up the input data in batches. Read about it here
Epochs: Number of times the complete datset is passed through the network. The data is passed in mini batches (batch_size) until all the data is passed once through the netwrok. This is 1 eopch.
The data can be very easily loaded into keras using this
(x_train, y_train), (x_test, y_test) = mnist.load_data()
This will load the data and will split it into 60000 train and 10000 test samples.
x_train.shape = (60000,28,28) : 60000 examples, 28 rows and 28 columns
From the source: Keras Documentation
When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format=”channels_last”.
Simple Neural Network
The data is fed into a simple neural network with 28*28=784 input units. There are no convolution units in this case.
The data is therefore reshaped from (60000,28,28) to (60000,784) using the following python command
x_train = x_train.reshape(60000, 784) x_test = x_test.reshape(10000, 784)
Convolution Neural Network
The CNNs need to input the data in a specific format. mnist.load_data() supplies the MNIST digits with structure (nb_samples, 28, 28) i.e. with 2 dimensions per example representing a greyscale image 28x28.
The Convolution2D layers in Keras however, are designed to work with 3 dimensions per example. They have 4-dimensional inputs and outputs. This covers colour images (nb_samples, nb_channels, width, height), but more importantly, it covers deeper layers of the network, where each example has become a set of feature maps i.e. (nb_samples, nb_features, width, height).
The greyscale image for MNIST digits input would either need a different CNN layer design (or a param to the layer constructor to accept a different shape), or the design could simply use a standard CNN and you must explicitly express the examples as 1-channel images. The Keras team chose the latter approach, which needs the re-shape.
# Check the format: channel first or last if K.image_data_format() =='channels_first': x_train = x_train.reshape(x_train.shape, 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape, 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape, img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape, img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1)
Output units to one hot vectors
The output obtained from the raw data (y_train & y_test) needs to be converted from 60000x1 to 60000x10 matrix, each column will contain the one hot encoding for the specific output unit
## Convert class vectors to binary class metrics y_train = keras.utils.to_categorical(y_train,num_classes) y_test = keras.utils.to_categorical(y_test,num_classes)
keep saving the model’s accuracy and loss (training and validation) in a variable by doing this
history = model.fit(x_train,y_train, epochs = num_epochs, batch_size=batch_size, verbose=1, validation_data=(x_test,y_test))
history will keep saving the results for each epoch. To see what is stored in history run this
which will output
history.history.keys() Out: dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])
To plot the numbers, the following code can be used
import matplotlib.pyplot as plt plt.plot(history.history['acc']) plt.plot(history.history['val_acc']) plt.xlabel('Epochs') plt.ylabel('%') plt.title('Accuracy (train and validation)') plt.show() plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.xlabel('Epochs') plt.ylabel('%') plt.title('Loss (train and validation)') plt.show()
Rest of the code is self explanatory, please let me know if any more clarification is needed on anything else.
Code Source: https://github.com/ashukumar27/DeepLearning/blob/master/CNN/01_mnist_cnn.py